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Preface 


This manual describes the SANE unit, which provides new data types and 
an extended-precision arithmetic system based on the proposed IEEE 
Standard, and the Elems unit, which provides mathematical and financial 
functions not previously available to Pascal users. 


The manual is for these groups of Pascal users: 


- Those who must calculate with more than seven decimal digits of 


precision. 


- Those who need extended-precision intermediate results, such as 


Those who must compute exactly with large integral values, such 


using data provided by 


familar with the Pascal 
system. 


to the current version of the 
the Apple III. 


used to draw your attention to 


Statisticians. 


as writers of accounting programs. 


Those who do financial computations, 
accounting programs. 


Before reading this manual, you should be 
language and your particular Apple Pascal 


Parts of the Appendixes refer exclusively 
SANE and Elems units for the Apple II and 


The Eye Symbol 


Throughout this manual, the eye symbol is 
important items of information. 


Watch out! The eye indicates points you need to be cautious 


about. 


۷1 Preface 


Gray Sections 


Any chapter or section printed on a gray background discusses advanced 
features. You can skip these parts on a first reading, and refer to 
thea later as seated. A casual user will have little need of these 
arts of the meoweli. A numerical analyst will use them heavily. 








Chapter 1 


Casual User's Guide 


Introduction and Overview 


This manual describes the interfaces of two Apple Pascal units: SANE, 
which supports the Standard Apple Numeric Environment (S.A.N.E.), and 
Elems, which computes some useful financial and mathematical functions. 


As its name implies, we plan to support S.A.N.E. across several future 
Apple products. S.A.N.E. gives you access to numeric facilities 
unavailable on almost any computer of the early 1980's--from 
microcomputers to extremely fast, extremely expensive supercomputers. 
The core features of S.A.N.E. are not exclusive to Apple; rather they 
are taken from Draft 10.0 of Standard 754 for Binary Floating-Point 
Arithmetic as proposed to the Institute of Electrical and Electronics 
Engineers (IEEE). Thus SANE 15 one of the first widely available 
products with the arithmetic capabilities destined to be found on the 
computers of the mid-1980's and beyond. Apple first Supported the 
proposed IEEE Standard in its initial release of Apple III Pascal, which 
included a single-precision implementation of Draft 8.0 of the Standard. 


The IEEE Standard specifies standardized data types, arithmetic, and 
conversions, along with tools for handling limitations and exceptions, 
that are sufficient for numeric applications. SANE and Elems go beyond 
the specifications of the IEEE Standard by including a data type 
designed for accounting applications, and by including several 
high-quality library functions for financial calculations. 


The proposed IEEE arithmetic was specifically designed to provide 
advanced features for the numerical analyst without imposing any extra 
burden on casual users. (This is an admirable but rarely attainable 
goal; text editors and word processors, for example, typically suffer 
increased complexity with added features, meaníng more hurdles for the 
novice to clear before completing even the simplest tasks.) The 
independence of elementary and advanced features of the IEEE arithmetic 
was carried over to the SANE unit, so that casual users need not master 
advanced features. 


کس 
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If you are familiar with Pascal, you should be able to use SANE just on 
the basis of the terse comments in the INTERFACE found in Appendix A. 
The rest of this chapter is an overview of SANE by means of examples and 
dialogue. We encourage you to refer to Appendix A while perusing the 
examples. 


Examples 


Two examples, a Pascal program and a Pascal unit, demonstrate the use of 
SANE. We encourage you to type in these examples, to compile them, and 

in the case of the program, to execute the code file while following the 
discussion. (Before you can compile and execute the code file, you will 
need to install the SANE unit into your system library, as explained in 

Appendix B.) 


Example 1 


This program reads an input string representing a floating-point value 
and echoes it to the screen. It demonstrates how data types are 
declared in SANE, and how values can be accepted on input and displayed 
on output. 


program EchoNumber; 


Uses 
SANE; 

Var 
InStr, OutStr : DecStr; { Input and output strings. } 
X : Single; ) Single value of InStr. } 
f : DecForm; { Specifies output format. ) 


begin { EchoNumber } 


f.style := FLOAT; { Floating output format. } 


f.digits := 9; { 9 significant digits. } 

write ('Enter number: '); 

readin (InStr); { Read first input string. } 

while InStr <> ۷ do begin 
Str28 (InStr, X); ( Convert input to Single value X. ) 
S2Str (f, X, OutStr); ( Convert X to string by f. ) 


writeln (OutStr); 

write ('Enter number: '); 

readln (InStr) { Read next input string. 
end 


end { EchoNumber } . 
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Examples 3 


In the program EchoNumber note that 


- the input and output strings (InStr and OutStr) are of type 
DecStr, a Pascal string type defined by the SANE unit; 


- a variable X of type Single (defined in Chapter 2) has been 
declared to hold the value of the input string; 


the variable f is of type DecForm, which specifies the format of 
the output string. In this case, f is assigned so that the 
output will be in FLOAT format (as opposed to FIXED), and will 
show 9 significant digits; 


- the SANE routine Str2S converts the ASCII characters from the 
input string InStr to the Single value X; and 


the SANE procedure S2Str converts the Single value X to the 
output string OutStr. The format of this string is determined 
by the value of f. 


Throughout SANE and Elems, the names of procedures reflect the data 
types involved. For example, Str2S converts to Single. There are also 
procedures Str2D, Str2C, and Str2X for converting to the other SANE 
data types Double, Comp, and.Extended, respectively. 


( 5 


You will note (for instance) that the input string '0.5' is echoed (as‏ / ا 
you would expect) as '5.00000000E-1', whereas the input value '0.1' is‏ 
echoed as '1.00000001E-1'. The source of this apparent anomaly is‏ 
discussed in Chapter 4.‏ 


Now compile and execute the program, trying out various input values. 


Example 2 


The second example shows the use of SANE from another unit. If you are 
unfamiliar with Pascal units, you may want to refer to your Apple Pascal 
manual. This example also shows how expression evaluation is 
accomplished using Extended intermediate variables. 


The unit provides a procedure to evaluate the dot product of two 
vectors. The input vectors v and w (of type Vector) are represented as 
arrays of Single values. The desired result is the Single value z. In 
order to compute the value of z with maximum accuracy, all of the 
intermediate calculations are performed in extended precision. This 
feature 15 at the heart of the design of the SANE unit. 





Chapter 1 


Single); 


Single) (1 


Casual User's Guide 


UNIT Dot Prod : 


INTERFACE 


N = 20; { Size of Vector. } 


Vector = array [1..N] of Single; 


Procedure DotProduct (v, w : Vector; var z 


IMPLEMENTATION 


Procedure DotProduct ( (v, w : Vector; var Z 


{ Returns the dot product of v and w in و2‎ 
accumulated in Extended and returned in Single. ! 


var 
s, t :^Extended; 
i: ۹41 


begin { DotProduct } 


}0 سب و { s);‏ ,0( 121 


for i :- 1 to N do begin 


526 (v [Ll]. 1 ( t <— v [1] } 
Muls (w [i], 1 (t سس‎ v [i] * w [i] ( 
{ Accumulate in Extended. } 
AddX (t, s) ) s <--s + ع‎ } 
end; 
X28 (s, z) { Produce Single result. } 


{ DotProduct } ; 


end 


END ) DotProd ( . 


In the procedure DotProduct note that 


- the sum s is initialized to zero using [2X (I2X provides 
convenient and efficient assignment of integral constants to 
Extended); 


iran 
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- a Single value from v is converted to extended precision in the 


temporary variable t. This conversion is performed by S2X and 
is exact (as discussed in Chapter 4); 


t is directly multiplied by the corresponding value from w, 
leaving the extended-precision result ín t; 


- the sum is accumulated ín extended precision by adding t 
directly to the Extended value s; 


- when the loop completes, the sum ín s is converted, using X2S, 
to the desired Single result Z: 


- all of the basic arithmetic operations in the SANE unít on two 
values are two-address operations; that ís, the operation is 
performed on the two inputs and the result ís stored in the 
second argument (as in MulS and AddX in the example); 


- all arithmetic operations are performed in extended precision 
and the result is returned in Extended(the reasons for chis 
type of arithmetic are discussed below); 


- the names of the procedures again reflect the type of the ínput 
argument; that is, MulS multiplies an Extended by a Single, 
AddX adds an Extended to an Extended, and X2S converts an 
Extended to a Single. 


Questions and Answers about SANE 


In this section, we answer several questions about SANE, to explain the 
intent of the numeric environment SANE provides, before explaining that 
environment in detail in the following chapters. 


Does SANE provide IEEE-conforming arithmetic? 


SANE supports all of the features of Draft 10.0 of the proposed 
Standard, with the exception of rounding precision. SANE supports the 
required data types, exceptions and rounding directions; conversions 
between binary and decimal; comparisons; denormalized numbers and the 
treatment of gradual underflow; as well as the basic arithmetic 
operations add, subtract, multiply, divide, square root, exact absolute 
remainder, and round to an integral value. In addition, the unit 
provides operations that are only recommended, including negate, 
absolute value, copy-sígn and next-after. These operations are all 
implemented to che strict specifications of the proposed Standard. The 
implementation has been completely validated by test procedures 
developed by members of the Standard Committee. 
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Don't Apple Pascal systems already have floating point? 


Apple II and IIe Pascal versions 1.0 through 1.2, which provide the real 
arithmetic required by the Pascal language, do have floating-point. 
However, this arithmetic implementation does not conform to the IEEE 
standard and does not provide the high-precision data types, superbly 
accurate arithmetic results, exception facilities, or specíal procedures 
that S.A.N.E. offers. 


Similar remarks apply to the Pascal in the first release of Lisa 
Workshop (although it meets IEEE accuracy specifications). 


Apple III Pascal 1.0 and 1.1 arithmetic and the RealModes unit are based 
upon Draft 8.0 of the IEEE Standard. This implementation contains only 
single-precision (32-bit) real arithmetic. A number of changes to the 
proposed Standard have been made since Draft 8.0. 


How is the SANE unit different? Why is it better? 


The arithmetic implemented by tne SANE unit conforms to Draft 10.0 of 
the proposed Standard. It supports Single and Double data types using 
extended-precision arithmetic. In addition, SANE provides a new data 
type, Comp, for performing integral arithmetic with up to 18 digits of 
precision. Like Single and Double, Comp is a storage type for Extended 
arithmetic. This type has been added to allow application writers to 7 
compute, for instance, accounting quantities, with the required 
accuracy, and within the same framework to use these values for 
financial applications, such as computing compound interest to double 
precision, The default modes are set so that the system is closed and 
non-stop, in the sense that any SANE operation will produce a 
predictable result in all cases, without causing any run-tíme errors. 
Even under conditions such as overflow or division-by-zero, an operation 
will deliver a well-defined result and set exception flags, and 
computation will continue. The exception flags may either be 
interrogated or ignored at the programmer's choice, but no fatal error 
will occur. 


Why does SANE use procedure calls instead of ínfix operators? 


The SANE Pascal unit represents the first step in making the Standard 
Apple Numeric Environment available to Apple users. Apple intends to 
support this environment across several future products, including full 
integration into the Pascal language. Expression evaluation using the 
SANE procedure calls is cumbersome compared with the simple and more 
natural infix notation for the arithmetics built into the current Apple 
Pascal systems. However, whether you use the SANE unit should be 
determined by the requirements of your application (this point is 
discussed in more detail in Chapter 2). 
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Questions and Answers about SANE 7 


Why is the destination of SANE operations Extended? 


Arithmetic operations in SANE are based around extended precision for 
several reasons. The Extended type is the type in which arithmetic is 
performed, and the types Single, Double, and Comp are considered to be 
storage types for application data. Conversion of Single, Double, and 
Comp to and from Extended is exact and causes no loss of accuracy. This 
style of arithmetic allows operations, such as the vector dor product 
given in Example 2 above, to be computed using an Extended temporary 
variable with minimum loss of accuracy, improving the quality of the 
possibly less precise end result (in Example 2, the end result was 
Single). The general approach of using Extended-based arithmetic 
follows that of forthcoming hardware chips for IEEE floating-point. 
Also, the unit interface is much simpler than it would be if operations 
of lesser precision were included. 
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Chapter 2 


Data Types 


Single, Double, Comp, and Extended 


SANE provides three application data types--Single, Double, and Comp 
(for computational)--and the arithmetic type--Extended. Single, Double, 
and Extended store floating-point values and Comp stores íntegral 
values. 

Extended is called the arithmetic type because, to make expressíon 
evaluation simpler and more accurate, SANE performs all arithmetic 


` operations in extended precision and delivers arithmetic results to the 


Extended type. Single, Double, and Comp can be thought of as 
Space-saving storage types for the extended-precision arithmetic. All 


: values representable in Single, Double, and Comp (as well as in the 


Pascal integer type) can be represented exactly in Extended. Thus 
values can be moved from any of these types to Extended and back without 
any loss of information. 

Pascal's l6-bít integer arithmetic, used mainly for program indexing, 


remains distinct from SANE arithmetic. However, any program using the 
SANE unit can use Pascal integer arithmetic. 


Choosing a Data Type 


Typícally, picking a data type requires that you determine the 
trade-offs between 


- precision; 
- range; 


- fixed- or floating-point type; 
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- memory usage; and 
- computational speed. 


The precision, range, and memory usage for each SANE data type are shown 
in the table below. See the section "Conversions Between Binary and 
Decimal" in Chapter 4 for information on conversion problems relating to 
precision. 


Most accounting applications require a counting type that counts things 
(pennies, dollars, widgets) exactly.  Accountíng applications can be 
implemented by converting money values into integral numbers of cents or 
mils, which can be stored exactly in the Comp format. The sum, 
difference, or product of, any two Comps is exact if the magnitude of the 
result does not exceed 2 - 1 (that is, 9,223,372,036,854,775,807). 
This number is larger than the national debt, expressed in Argentine 
pesos. In addition, Comp values can be used in SANE floating-point 
computations, such as interest and tax evaluations. 


Comp-type arithmetic is done internally using the Extended data type. 
There is no loss of precision, as conversion from Comp to Extended is 
always exact. However, some space can be saved by using the Comp type, 
rather than the Extended type, for storing numbers: the Comp type is 
20% shorter, as it has no exponent.  Non-accounting applications will 
normally be better served by the floating-point data formats. = 


سورس :سا at‏ نه 
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Values Represented 


The floating-point storage formats, Single, Double, and Extended, 
provide binary encodings of a sign (+ ot -), an exponent, and a 
significand. A represented number has the value 


ea — emu ممصت‎ dm mam تحص ده‎ 


*significand * „exponent 


where the signif 4 has a single bit to the left of the binary point 
(that is, 0 <= significand < 2). 
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Table of Types 


This table describes the range and precision of the numeric data types 
supported by SANE. 
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E. class | Pascal | Application Arithmetic 
مات ته‎ RUP eS ee MUN | ----------| شا ک سے كا 555 5 اه عا ند دج دبلاک با‎ 
p identifier integer | Single Double | Comp Extended 
ہے جیھ — جھہ دب ووه سه — — — — سه سه شب وب و‎ | be دع سے دع سی دی 8ه‎ eee | ene ته په غه‎ eee وو ووی وې‎ B می‎ Gi شب‎ en سپ سے سے سے — — —— سے کے | لاہ دہ سم سع هش‎ CUN وھ جج سے سے سے کت خے س ہے‎ 
| Size | | | | 
| (bytes:bits) | 2:16 | 4:32 8:64 | 8:64 | 10:80 | 
|Binary exponent | | | | | 
| range | | | | | 
١ Minimum -— | -126 | -1022 | ت‎ | -16383 
Maximum سعد‎ 127 | 1023 | u | 83 
nel e | e | ee | e | = 
Significand | d | 
precision | | 
| Bits 15 24 | 53 63 64 
Decimal digits 4-5 7-8 15-16 | 18-9 19-0 
{Decimal range | | 
Min negative -32768 | -3.4E*38 | -1.78+308| 2-9.2E18 |-1.1E+4932 
Max neg norm , ~1.2E-38 | -2.3E-308 -1.7E-4932 
| Max. neg denorm | -i.5E-45 | -5.0E-324 —1.9E-4951 
" | 
Min pos denorm | 1.55-45 5.08-64 | I.9E-4951 
Min pos norm 1.2E-38 2 . 28-8 1.7E-4932 
pe Max اا‎ | 32767 3.4E+38 1.7E*308| 3 9.2E18 1.1 5+2 
-| ene سے | ہت‎ PES یت‎ | eamm m سن‎ n dmm بت‎ 
"مما ندعم ا‎ | No | Yes | Yes No Yes 
manes l مت |-——----—- ے‎ r عم = | قمعت‎ 
NaNs No | Yes Yes | Yes Yes 
| e ——— on سنا سنا‎ ۱ 








Denormalized numbers, or denorms, are defined in Chapter ۰ 


Usually numbers are stored in a normalized form, to afford maximum 
precision for a given significand width. Maximum precision is achieved 
if the high order bít in the significand is 1 (that is, 

l <= significand < 2). 
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Data Types 


representable number has 


eae‏ و 
2 1111111111 
127 

E E 

3.403 * 8 


Example 


significand - 


exponent = 


value = 


In Single, the largest 


the smallest representable positive normalized number has 


1 


و 1.00000000000000000000000 
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Chapter 3 


Arithmetic Operations 


This section discusses the arithmetic operations, add, subtract, 
multiply, divide, remainder, and square root. Exceptional cases for 
these operations are covered in Chapters 7 and 8. 


Add, Subtract, Multiply, and Divide 


The arithmetic operations add, subtract, multiply, and divide are 
provided by sixteen procedures (see Appendix A): 


AddS, AddD, AddC, AddX; 
SubS, SubD, SubC, SubX; 
MulS, MulD, MulC, MulX; 
0175, DívD, DivC, DivX. 


Each procedure has two operands. The first is always a value parameter 
of type Single, Double, Comp, or Extended, as indicated by the last 
letter of the procedure name. The second is always a variable parameter 
of Extended type that receives the result. For example, subtraction is 
provided by the procedures SubS (subtract Síngle), SubD (subtract 
Double), SubC (subtract Comp), and SubX (subtract Extended). If x and y 
are declared by 


var x : Single; 
y : Extended; 


then the statement 
SubS (x, y); { y <-- y - x } 


causes x to be subtracted from y and the extended-precision result 
to be stored in y. 


Example 


To compute q := a / b , where a, b, and q are of type Double, 
declare: 


TN 


ON 
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var a, b, q : Double; 


t : Extended; ( extended temporary } 


and write: 


D2X (a, t); ) t سب‎ a } 
0179 (b, t); { t <-- a/b} 1 
X2D (t, q); {q ع س»‎ } 

Remainder 


The remainder operation is provided by the one procedure 
procedure RemX (x : Extended: var y : Extended; var quo : integer); 
The result delivered to y is the remainder r specified as follows: 


When x is not equal to 0, the remainder r = y REM x is defined 
regardless of the rounding direction by the mathematical relation 

r= y = x * n, where n is the integral value nearest the exact 

value y / x; whenever | n- y / x | = 1/2, n is even. The 

remainder is always exact. If r = 0, its sign is that of y. = 
(Rounding direction is defined in Chapter 8.) 


The third argument, quo, delivers the integer whose magnitude is 
given by the seven least significant bits of the magnitude of n, 
and whose sign is the sign of n. (Quo is useful for reducing the 
arguments of trigonometric functions, but can be ignored if not 
needed.) 

The IEEE remainder function differs from other commonly used 


remainder functions. It is chosen because it is always exact and 
because all the other remainder functions can be built from it. 


Square Root 


The square root operation is provided by 
procedure SqrtX (var x : Extended); 


for any x >= 0. The argument x is both source and destination. 
The square root of -0 is =0. 
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Square Root 15 
C Example 
0 To find v :* square root of u , where u and v are of type Single, 
declare 
var ولا‎ v : Single; 
t : Extended; { extended temporary } 
and write " 
S2X (u, t); (t سب‎ u } 
SqrtX (t); ( t <=- sqrt (u) ( 
X28 (t, v); (v&--t ) 
ما د‎ 
ےہ‎ 
i 
| 
کک‎ 
i 
| 
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Chapter 4 


Conversions 


Conversions to and from Extended 


Conversions between the Extended type and the other numeric types 
recognized by SANE are provided by the procedures 


I2X - integer to Extended 
S2X - Single to Extended 
025 - Double to Extended 
C2X ~ Comp to Extended 

X2X - Extended to Extended 
X21 - Extended to integer 
25 - Extended to Single 
X2D - Extended to Double 
X2C - Extended to Comp 


For example, if x and y are declared by 


var x : Comp; 
y : Extended; 


then to convert a Comp-format value in x to an Extended-format in y, 
write 


C2X (x, y); { y Ke x T 


Note that IEEE rounding into integral formats differs from most common 
rounding functions on halfway cases. With the default rounding 
direction (TONEAREST), the conversions X2I, X2C, Str2C, and Dec2C will 
round 0.5 to 0, 1.5 to 2, 2.5 to 2, and 3.5 to 4, rounding to even on 
halfway cases. (Str2C and Dec2C are discussed later in this chapter. 
Roundíng is díscussed in detail in Chapter 8). 


Conversions between SANE storage types and the Pascal real and 
long-integer types are discussed in Appendixes C and E, respectively. 
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Exceptions 


Conversions to the Extended storage type are always exact. However, 
the conversion procedures X2I, X2S, X2D, and X2C move a value from 
Extended to a storage type with less range and precision, and set the 
OVERFLOW, UNDERFLOW, or INEXACT exception flags when appropriate. As 
the integer format does not support NaNs and infinities, X21 sets the 
INVALID exception flag if the first operand is a NaN, an infinity, or a 
number that overflows. In these cases the result stored for the integer 
operand is -MAXINT - 1 = -32768. If the first operand of X2C is a NaN, 
an infinity, or a number that overflows, then the result is the 
Comp-type NaN, and for infinities and overflows, the INVALID exception 
is signaled. X2X (x, y) sets the INVALID exception flag if x is a 
signaling NaN, whereas y :* x does not. 


Conversions Between Binary aud Decimal 


The IEEE Standard for binary floating-point arithmetic specifies the set 
of numerical values representable within each floating-point format. غ1‎ 
is important to recognize that binary storage formats can exactly 
represent the fractíonal part of decimal numbers in only a few cases; in 
all other cases, the representation will be approximate. For example, 
0.555: or 1/2, 4s 
hand, 0. وې‎ or 1/10, 45 
0.00011001100...... Its closest representation in Single is 
0.000110011001100110011001101 , which is closer to 0. 10000000149, ٢ 
than to 0 10000000000 ° This explains the apparent anomaly in the 


output of Example 1 in Chapter ۰ 


can be represented exactly as 0.1,. On the other صا‎ d 
is a repeating fraction in binary: 


As binary storage formats generally provide only close approximations 

to decimal values, it 15 important that conversion between the two types 
be as accurate as possible. Given a rounding direction, for every 
decimal value there is a best (correctly rounded) binary value for each 
binary format. Conversely, for any rounding direction, each binary 
value has a corresponding best decimal representation for a given 
decimal format. Ideally, binary-decimal conversion should obtain this 
best value to reduce accumulated errors. The IEEE Standard specifies 
very stringent error bounds on conversions; the conversion routines in 
SANE follow more stringent bounds still. (See the IEEE Standard [8] for 
a more detailed descríption of error bounds.) 


See Appendix G for binary-to-decimal conversion details that are 
peculiar to this version of the SANE unit. 


Converting Decimal Stríngs into SANE es 


س 


The procedures Str2S, Str2D, Str2C, and Str2X convert numeric 
strings into Single, Double, Coup, and Extended formats, respectively. 
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Example 1 


To assign -0.0000253 to an Extended variable x, write 


var x: Extended: 


Str2X ('=2.53E-5', x); { or 96۳25 ('-0.0000253', x); } 


These routines are provided as a convenience for those who do not wish 
to write their own scanners. The routines parse numeric strings into 
binary storage formats. Each routine determines the value of the string 
from the longest prefix of the string that is recognized as a number. 

If no part of the string is recognized as a number or a null string is 
encountered, then the routine returns a zero. 


ل موه هده ہم لہ موہ aom |n‏ 


However, if the first character after leading blanks have been 


discarded aud the optional sígn has been parsed is an 'i' or an 'I', 
then the string is interpreted as an infinity. Likewise, if the first 
character after leading blanks have been discarded and the optional sign 
has been parsed ts an 'n' or am 'N', then the string is interpreted as. a 
NaN.. 

The strings described by standard Pascal syntax are a subset of the 
stríngs accepted by these comversion routines. These routines accept 
other strings, too (for example, they accept '.3', whereas standard 
Pascal requires a leading digit before a decimal point). 


The Comp format has no representation for infinitíes; Str2C 
signals INVALID and delivers a NaN whenever the string operand is am 
infinity or 2 number that overflows the Comp format. 


Converting SANE Types into Decimal Strings 


The procedures S2Str, D2Str, C2Str, and X2Str will convert a Single, 
Doubie, Comp, and Extended, respectively, into a numeric string (of type 
DecStr). As any numeric value can have many decimal representations, 
you must specify the decimal result format. To do so, pass a record 

of type DecForm, shown below: 


DecForm س‎ record 
style : (FLOAT, FIXED); 
digits : integer 
end; 


This record specífies two things: 
- style (either FLOAT or FIXED); and 
- digits (the number of significant digits for style FLOAT or the 


number of digits to the right of the decimal point for style 
FIXED). This number may be negative if the style is FIXED. 


ہے 


— س —— — ےس لے ———— 
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Example 2 


To print the value of a Double variable y using a fixed-point decimal 
format with ten digits to the right of the decimal point write 


var y: Double; 
S: DecStr; 
f: DecForm; 


f.style := FIXED; 
f.digits :* 10; 


a . * 


D2Str (f, y, s); 
writeln ('y * ', s); 


Numbers that round to zero ín the specified DecForm are converted to the 
string ' 0.0' or '-0.0'. Mall's are converted to the string " NMeN''" or 
"-NaN''". (Double quotes are used hera hacsuse the string contains 
single quotes.) Infinities are convertors = na INFINITI' of 
'-INFINITY'. 








other numbers- bave ia ae. af 
ifies no more the SISUTCLEE 7۹ء‎ | 
formatted number is padded with zeros vivace necessary. If the resulting 
string has more than DECSTRLEN characters, the number is represented in 
floating-point notation. (SIGDIGLEN and DECSTRLEN are dependent on the 
implementation: they are specified in the INTERFACE to the SANE unit, 
shown in Appendix A.) 


All string results have either a leading negative sign or a leading 
blank (thus, columns of numbers will line up regardless of sign). 


Decimal Record Conversious 
The Decimal record type provides an intermediate canonical form, 
(-1)*8? 4 sig + ۴ 


rs who wish to do their own parsing of numeríc input or 
numeric output. This form is specified in the INTERFACE 


for. 






SigDig * string [SIGDIGLEN]; 


Decimal * record 


sgn : 0..1; { Sign (O for pos, 1 for neg). } 
exp : integer; ( Buponeat. } 
sig : SigDig { String of significant digits. } 


end; 
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The procedures S2Dec, D2Dec, C2Dec, and X2Dec each converts a Single, 
Double, Comp, or Extended value, respectively, into a record of type 
Decimal. A DecForm operand (defined in the preceding section) specifies 
the format of Decimal. Numbers that round to zero, infinities, and 
NaN's are passed to the sig part of the Decimal record as 'O', 'I', or 
'N', respectively, (the exp part of Decimal is unchanged). The maximum 
number of ASCII digits passed to sig is SIGDIGLEN, and the implied 
decimal point is at the right end of sig with exp set accordingly. 


The procedures Dec2S, Dec2D, Dec2C, and Dec2X convert a Decimal record 
into Single, Double, Comp, and Extended, respectively. The sig part of 
Decimal accepts up to SIGDIGLEN significant digits with an implicit 
decimal point at the right end; however, the following exceptions are 
permitted. 


— If the first ASCII character is "O° (zero), the number is . 
converted to zero. i 


- If the first ASCII character is 'N', the number is converted to 
a NaN. 


- If the first ASCII character is 'L*, the number fs converted. to 

an infinity. MES { 
A | | j 
— If the destination: is a Comp type, an infinity is converted to a ' 
|1 NaN, and. the INVALID exception is signaled. 
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Chapter 5 


Expression Evaluation 


The SANE floating-point unit is designed to operate on Extended values. 
For example, DivD (x, y) operates on the Extended-format value in y by 
dividing the Double-format number x into y and leaving the result in y. 
To evaluate more complicated expressions, Extended temporaries can be 
used. 


Examples 


The following examples illustrate extended-based expression evaluation. 
The first example uses an Extended accumulator to store the results of 
all operations. 


Example 1 


Compute the value of 


SA‏ سے مس س مس مس وھ و دہ نے کے سے 


where all variables are of Double type. 


var a, b, c, d, e, f, r : Double; 


t : Extended; { extended temporary } 
begin 

D2X (a, t); (t و ہپ‎ } 
AddD (b, t); { t <-- a+b } 
SubD (c, t); {t<--atb-e } 
MulD (d, 1 { ع‎ <--(atb-c) و ٭‎ } 
AddD (e, t); { t <e (a+ b-c) * d + e } 
DivD (f, t); { t <== ((a + b = c) * d +e) / f } 
29 (t, r); { r <e ع‎ } 


^ 
0 
3 


sae eee 
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Note that although the arithmetic style is extended-based, not every 
operand need be converted to Extended. In the example, only one 
explicit conversion to Extended was required. 


Example 2 


Compute the value of 


where a, b, c, and r are of Single type. 


var a, b, c, r : Single; 


tl <== sqrt (b^2 - 4 * a * c) 
tl <-- -b + sqrt (b^2- 4 * a * c) 


SqrtX (tl); 
Subs (b, tl); 


tl, t2 : Extended; { extended temporaries } 
begin 

S2X (b, t1); tl مسب‎ } 
MulS (b, tl); tl <-- b^2 ) 
I2X (4, t2); 't2 سس‎ 4 } 
MulS (a, t2); t2 ٭ 4 ہے‎ a } 
MulS (c, t2); t2 سټ‎ 4 * a* c } 
b^2-4 *a*c ! 

} 

} 

} 

} 


{ 
{ 
{ 
{ 
{ 
SubX (t2, t1); ( tl <-- 
{ 
{ 
{ 
{ 
{ 


S2X (a, t2); t2 سب‎ a 
AddS (a, t2); t2 <-- 2 * a 
DivX (t2, tl}; tl &-- (-b + sqrt (b°2 - 4 * a * ((ء‎ 
/ (2 * a) } 
X2S (tl, r); { r سب‎ tl } 
2 


Exceptional cases include b > 4 * a * c and a = Û. For information on 
how SANE handles these and other exceptions, see Chapters 7 and 8. 


(The common formula for a root of a quadratic equation was chosen 
solely to illustrate expression evaluation. More accurate methods 
exist for solving this problem.) 


Example 3 


Evaluate the polynomial 
2 n 
٠ے‎ * * * 
y: مع‎ + c; x + و‎ Xc c وړ‎ F Ca x 


and its derivative 


woe FN Ñ‏ ط x‏ ٭ G.‏ ٭ 3 ےم G. LX c, X X‏ سه 
ع 3 2 1 Dy‏ 





where the coefficients c, through c, are stored in an array of 
Single and x, y, and Dy are of type Single. 


const NMAX = 100; 


var n, i: O..NMAX; 
X, y, Dy : Single; 
c : array [0..NMAX] of Single; 


tl, ( For computation of y. ) 
t2 : Extended; ( For computation of Dy.} 
12x (0, tl); { tl <== 0 } 
t2 :- tl; ( t2 »-- 0 } 


for i := n downto l do begin 


(tl <-- c [i] « x * اع‎ : } 
MulS (x, tl); { cl <== x * tl } 
AddS (c [i], tl); (tl €&- c [i] + 1 } 
(t2 <=- ti * x * 2غ‎ : } 
MulS (x, t2)? { t2 سب‎ x * 2 } 
AddX (cl, t2) ) t2 نه‎ tl + 2 } 
end; 
{ tl <-~ ع‎ {O} +x * el : } 
MulS (x, tl); { tl <-- x * 1 } 
AddS (c [0], tl); { tl <-- e [0] + el } 
X2S (tl, y); ( y <-- tl ) 
×28 (t2, Dy); { Dy <-- t2 } 


The method, called Horner's Rule, used to evaluate the polynomials is 
based on the polynomial representation 


: P دمه‎ (e * * + رم‎ txt C2) * X + ہے‎ ( * x + یی‎ 


It is more efficient than the straightforward computation suggested by 
the standard representation, shown at the beginning of the example, 
and is conveniently implemented using SANE's extended-based arithmetic. 


i 5 : : ۳ i ۳ 5‏ 
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Global Constants 


To speed up execution, constants in expressions in often-used routines 
can be defined globally (outside the routines). For example, if pi 5 
declared and defined by 


var pi : Extended; 


۰ * * 


begin 


Str2X ('3.14159265358979323846', pi); 
then executing 
X :* pi; 
is significantly faster than 
Str2X ('3.14159265358979323846', x); 
Defining constants globally is particularly helpful when the definition 
is via one of the string conversion routines, such as Str2X, which are 


designed for generality rather than speed. For conversion of integers, 
I2X is significantly faster than Str2X. 
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Chapter 6 


Comparisons 


Comparison Functions 


Any two floating-point values in the Extended format can be compared 
using 


function CmpX (x : Extended; r : RelOp; y : Extended) : boolean; 
or 
function RelX (x, y : Extended) : RelOp; 


The RelOp values are 


GT greater than 

LT less than 

GL greater than or less than 

EQ equal 

GE greater than or equal 

LE less than or equal 

GEL greater than, equal, or less than 


UNORD unordered 


Single, Double, or Comp values can be compared by first converting them 
to Extended. 


Operands are unordered whenever one or both of the operands ís a NaN. 
(NaNs are discussed in Chapter 7.) For every pair of operand values, 
exactly one of the relations LT, GT, EQ, and UNORD is true. The value 
of RelX is the appropriate one of these four relations.  CmpX (x, r, y) 
is true if and only if the relation x r y is true. 


Example‏ ےا 


If p is greater than q then print 'p > q is TRUE'; otherwise, print 
'p > q is FALSE'. 


eo 


رب طسوت —— سا حمر EE LE — — MÀ‏ سس پو مغلب re hr‏ سا سا وی ا 
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var p, q: Extended; 


if CmpX (p, GT, q) then 

writeln ('p > q is TRUE!) 
else 

writeln ('p > 4 is FALSE'); 


Note that equivalent results are produced by 
if CmpX (p, LE, q) or CapX (p, UNORD, q) then 
writeln ('p < q 15 FALSE') 
else 
writeln ('p < q is TRUE'); 
or by 


case RelX (p, q) of 


GT: 

writeln ('p > q is TRUE'); 
LT, EQ: 5 

writeln ('p > q is FALSE‘); 
UNORD: 


begin 
SetXcp (INVALID, TRUE); { See next section. } 
writeln ('p > q is FALSE') 

end { UNORD } 


end; { case RelX } 


Comparisons Involving Infinities and NaNs 


+INFINITY is greater than any finite number and -INFINITY.  -INFINITY is 
less than any finite number and +INFINITY. +INFINITY equals +INFINITY 
and -INFINITY equals -INFINITY. The zeros, +0 and -0, are equal. 


CmpX (x, r, y) signals the INVALID (invalid-operation) exception if or 
y is a NaN and r is a relational operator involving "<" or ">": namely 
GT, LT, GL, GE, LE, or GEL. 
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Chapter 7 


Infinities, NaNs, and Denormalized 
Numbers 


In addition to the normalized numbers supported by most floating-point 
packages, IEEE floating-point arithmetic supports three other kinds of 
values: infinities, NaNs, and denormalized numbers. 


Infinities 


When a SANE operation attempts to produce a number whose magnitude is 
too large for its result's Format, the result may (depending on the 
rounding direction) be a special bit pattern called an infinity. These 
bit patterns (as well as NaNs, introduced next) are recognized in 
subsequent operations. and produce predictable results. The infinities, 
one positive and one negative, generally behave as suggested by the 
theory of limits. For example, l added to +INFINITY yields +INFINITY; 
~} divided by +0 yields -INFINITY; and 1 divided by -INFINITY yields -0. 


The modeling of mathematical infinities is not perfect, however: for 
example, adding finite: numbers can overflow, producing infinities. 8 
overflows and in many other cases, the infinities may be regarded as 
undetermined very large finite numbers. 


Each of the storage types Síngle, Double, and Extended provides unique 
representations for +INFINITY and -INFINITY. The Comp type has no 
representatioas for infinities. (An infinity moved to the Comp type 
becomes a NaN.) 


NaNs 


When a floating-point operation cannot produce a meaningful result, the 
operation delivers a special bit pattern called a NaN (Not-a-Number). 
For example, 0 divided by 0 and +INFINITY added to -INFINITY yield NaNs. 
A NaN can occur in any of the SANE storage types: Single, Double, 
Extended, and Comp. The Pascal integer (l6-bit) storage type has no 
representation for NaNs.  NaNs propagate through arithmetíc operations. 
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Thus the result of 3.0 added to a NaN is the NaN. If two operands of an 8 
operation are NaNs, the result is one of the NaNs.  NaNs are of two 

kinds: quiet NaNs, the usual kind produced and propagated by 

floating-point operations, and signaling NaNs. When a signaling NaN is 
encountered as an operand of an arithmetic operation, the INVALID 
(invalid-operation) exception is signaled and, if no halt occurs, a 

quiet NaN is produced for the result. Signaling NaNs could be used for 
uninitialized variables. They are not created by any SANE operations. 


Denormalized Numbers 


Whenever possible, floating-point numbers. are normalized to keep the 
leading significand bit 1: this maximizes the resolution of the storage 
type. Im many current systems of floating-point aríthmetic, the 
smallest representable number is a normalized number with the miniosnmr 
exponent; when the result of an operatiou is smaller than this smallest 
normalized number, the system delivers zero as the result. 


As an. alternative to this flush-to-zero scheme, IEEE-standard 

floating-point arithmetic uses gradual underflow. When a number fs too 

small for a normalized representation, leading zeros are placed im the 
significand to produce a denormalized representation. A denormalized = 
number is. x non-zero number that is not normalized and whose exponent is 

the minimum exponent for the storage type. m بت‎ 


The example below shows how د‎ Single value becomes progressively 
denormalized: as it is repeatedly divided by 2, with rounding to nearest. 


A. ء‎ = E100 1100 1100 1100 1100 1101 * 2126 agg ez, 
K ALTO 0.110 0110 0110 01106 0110 0110 * 27125 (underflow) 
په‎ pe AT = 0.011 0011 0011 0011 00110011 * 26 
A, ~A,/Z = 0.00: 1001 1001 1001 1001 1010 * 27126 (underflow) 
A = په‎ /X = 0.000 0000 0000 0000 0000 0011 * 27 -126 
س ېږ‎ Maai Z = 0.000 0000 0000 0002 0000: 0010 * 2 -126 (underflow) 
E - AST - 0.000 0000 0000 0000 0000 0001 + 46 
2 = Az "E = 0.0 | | (underflow) 


t Ägg are denormalized; په‎ La the smallest positive denormalized. 
iue 


Although. denormalized numbers differ from ordinary normalized numbers. im 0 
having. Tess. storage precision, they participate in the arithwetic: im = 
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reasonable way and provide a valuable extension of the range of 
floating-point numbers. In some cases, the use of denormalized numbers 
allows a program to return an acceptable result, whereas under a 
flush-to-zero system the program would have returned a spurious result. 


(A program that relies on flush-to-zero to exit a loop when the value of 
a variable becomes so small that ít underflows may have to be modified 
to tun correctly under IEEE arithmetic.) 


Inquiries: NumClass and the Class Functions 


The functions ClassS, ClassD, ClassC, and ClassX can be used to classify 
the value of a variable. These functions are of type NumClass. and: 
return one of the values: 


SNAN - signaling NaN 
QNAN - quiet NaN 1 
INFINITE - infinity i 
ZERO — zero : ; 1 
NORMAL. - normalized number E ی‎ E : 
DENORMAL. - denormalized number cg Eh ER d 
The class functions also return the sign of a value as a variable 7 
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Chapter 8 


Environmental Control 


Environmental controls include the rounding direction, as well as i: 
exception flags and their corresponding halts. Except for conversionsg . 71 
between binary and decimal (whose slightly weaker conditions are : 
described in Chapter 4), all arithmetic operations are computed as. if E 
with infinite precisiom and then rounded to the destination format 

' according to the current rounding direction. 


a 


Rounding Direction TET xd 
2e ss 2 Pe 
3 

E 


mar cns quom. A ei RAS‏ لے 


M ul „ The rounding directions are of the type 
RoundDir = (TONEAREST, UPWARD, DOWNWARD, TOWARDZERO) ME 


^ The rounding direction affects all conversions and arithmetic operations 

* except comparison and remainder. The rounding direction is set by the 
SetRnd and SetEnv procedures: and can be interrogated by the GetRnd 
function. 


The default rounding direction is TONEAREST. In this direction the 
representable value nearest to the infinitely precise result is 
delivered; 1۶ the two nearest representable values are equally near, the 

: one with least significant bit zero is delivered. Hence, halfway cases 

i round to even when the destination is an integer type (X21, X2C, Str2C, 

۱ Dec2C) and when RintX is used. If the magnitude of the infinitely 
precise result exceeds the format's largest value (by at least one half 
unit in the last place), then the corresponding signed infinity is 
delivered. 


The other rounding directions are UPWARD, DOWNWARD, and TOWARDZERO. 
When rounding UPWARD, the result is the format's value (possibly 
INFINITY) closest to- and no less than the infinitely precise result. 
Wher rounding DOWNWARD, the result is the format's value (possibly 
-INFINITY) closest to and no greater than the infinitely precise result. 

; When rounding TOWARDZERO, the result is the format's value closest to 

CNN. and no greater in magnitude than the infinitely precise result. To آ‎ 
truncate a number to an integral value, use TOWARDZERO rounding with d 
X2I, X2C, Str2C, Dec2C, or RintX. "E r 
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Examp le 9 
The common rounding function specified by 
: trunc (x + 0.5), if x >= 0 
Rnd (x} = 
trunc (x - 0.5), i£ * < Q 
can be implemented by 

function Rnd (x : Extended} : integer; 


L Sets INVALID and returns —32768 if 
xis a NaN or x <= —-32768.5 or x >= 32767 ۔کہ‎ 


ED Cara INEXACT ۶ر‎ N | 0277 















: oe a 927685 € x > 327675 and 01 ٦ 

: ^: Seta no other exceptions. 1 

| /., war t : Extended; 

| tum z. integer; 

| DN SIE RoundDir; ١ 3 

| NOS مھ‎ t 

| سلو‎ ۲۵ F ۳ : 

i , 5 : 

i ۱ Scr2X (70.5*, c); : ۱ 

i "|. CpySgnx (t, x); E e کټ‎ +0.5 if x > 0 or x is +0 L 

| P E 9 [t cr -0.5 1۶ X > 0 or x is ی۔‎ L 

l سه‎ GetRnd; ۱ f Save rounding direction. 12 

i E aa (TOWARDZERO};. L Set round-toward-zero. } 

hed, ARAZ Cx, t); L سج‎ ert L 

! A L XZE ) i); 5 t E €— truncate (t) E 

۱ Ye 2 -EZX (i, t) { No exceptions! } 

۱ تح‎ CINEXACT, not (CmpX (t, EQ, x) چو‎ TestXcp (INVALID))); 

. mme f Correct INEXACT setting. } 

1 T T i, SetRnd (e); ( Restore rounding direction. } 
۲ ۱ vers, Bad re i { On. INVALID, i <— -32768. } 

{ Rnd + ; 


| M v ا‎ p 
i کر‎ ae on Exception Flags and Halts 
| 


The. exceptiou flags are values of the type 
. Exception = (INVALID, UNDERFLOW, OVERFLOW, DIVBYZERO, INEXACT) 


These five exceptions are sígnaled when detected, and if the 
corresponding halt is set the program will halt. Initially all 
exception flags and halts are cleared. You can examine or set 
‘individual exception flags and halts using TestXcp and TestHlt functions 
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= and SetXcp and SetHlt procedures. The SetEnv and GetEnv procedures can 
be used to set or get the entire environment (rounding direction, 
exception flags, and halts). 


Exceptions 


The INVALID (invalid operation) exception is signaled if an operand is 
invalid for the operation to be performed. The result is a quiet NaN, 
provided the destination is Single, Double, Extended, or Comp. The 
invalid operations are 


1. Addition or subtraction: magnitude subtraction of INFINITIES, 
for example, (+INFINITY) + (-INFINITY); 


i 2. Multiplication: 0 times INFINITY; 
3. Division: 0/0 or INFINITY/ INFINITY; 


4. Remainder: RemX (x, y, q), where 'x' is zero or 'y' is 
infinite; 


S. Square root if the operand is less tham zero; 
Str2C, and Dec2C) when an overflow, infinity, or NaN precludes 


a faithful representation in that format (see Chapter 4 for 
details); 


j 
| 7^ 6. Conversion to an integer or Comp format (procedures X2I, X2C, 
58 


| 7. Comparison via predicates involving "<" or ">" when at least 
one operand is a NaN; and 


8. Any operation on a signaling NaN except the sign manipulation 
procedures NegX, AbsX, and CpySgnX, and the class procedures 
ClassS, ClassD, ClassX, and ClassC. 


The DIVBYZERO (division-by-zero) exception is signaled 1f a finite 
nonzero number is divided by zero. It is also signaled, in the 
more general case, when an operatíon on finite operands produces 
an exact infínite result: for example, LogbX (0) returns 
-INFINITY and signals DIVBYZERO. 


ME 


If an operation on finite operands overflows to produce an inexact 
infinite result, the DIVBYZERO exception is not signaled. 


0 
ال ا ده خپ 


The OVERFLOW exception is signaled whenever the destination 
format's largest finite number is exceeded in magnitude by what 
^. would have been the rounded floating-point result were the 
C exponent range unbounded. 


The UNDERFLOW exception ís signaled when a result is both tiny and 
inexact (and therefore, perhaps significantly less accurate than 
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it would be if the exponent range were unbounded). & result 8 
considered tiny if, before rounding, its magnitude is smaller than its 
format's smallest positive normalized number. 


The INEXACT exception is signaled if the rounded result of an 
operation is not identical to the mathematical (exact) result or 
if the result overflows. 

Arithmetic on infinities is always exact and therefore signals no 
exceptions, except as descríbed in the above section on invalid 
operations. 


Managing Environmental Settings 


The environmental settings im the SANE unit are global and car be 
explicitly changed by the user. Thus all routines inherit these 
settings and are capable of changing them. If this is undesirable 
because either (a) a routine requires its own settings or 

(b) a routine's settings are not intended to propagate outside the 
routine, then special precautions must be taken. For example, you 
may want a routine to set, its own rounding direction and halt 
settings while not influencing the environment of the calling 
routines. (For a more complete explanation and examples, see 
Appendix D.) 





Chapter 9 


Auxiliary Procedures 


The SANE Unit includes a set of special routines: RintX, NegX, AbsX, 
CpySgnX, NextS, NextD, NextX, ScalbX, and LogbX. With the exception of 
RintX, which is required by the Standard, these routines are only 
recommended as aids to programming in an appendix to the Standard. 


Round to Integral Value 


An Extended variable can be rounded to an integral value by 
procedure RíntX (var x : Extended); 


The integral value is to extended precision, and is set according to the 
current rounding direction. The result is returned in the input x. 


Sign Manipulation 


Procedures NegX, AbsX, and CpySgnX each operate on an Extended variable, 
altering only the sign of the Extended argument. 


The negation operation is provided by 
procedure NegX (var x : Extended); 

which changes the sign of x. 

The absolute value operation 15 provided by 
procedure AbsX (var x : Extended); 


which makes the sign of x positíve. 
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An operation to copy the sign of one Extended variable to the sign of ` 
another is provided by 


procedure CpySgnX (var x : Extended; y : Extended); 
which copies the sign of y into the sign of x. 


These operations are treated as nonarithmetic in the sense that 
signaling NaNs do not signal the INVALID exception. 


Next-After 


The floating-point values representable in Síngle, Double, and Extended 
formats constitute a finite set of real numbers. The procedures NextS, 
NextD, and NextX each generate the next representable neighbor in its 
respective format, given an initial value and a direction. The first 
argument (x) to each of these routines is ‘bumped’ to the next 
representable value in the direction of the second argument (y). If 
x - y, the result 18 x. 


Single); 


+t 


procedure 5 (var x : Single; y 


The procedure NextS bumps the Single value x to the next representable 
Single value in the direction of y. 


procedure NextD (var x : Double; y: Double); 


1 The procedure NextD bumps the Double value x to the next representable 
Double: value in the direction of y. 


procedure NextX (var x : Extended; y : Extended); 


The procedure NextX. bumps the Extended value x to the next representable 
Extended value iu the direction of y. 


Special Cases and Exceptions in Next-After Procedures 


The following special cases can arise: 
- If x ع‎ y, the result is x; no exception is signaled. 


- If either x or y is a quiet NaN, the result is one or the other 
of the input NaNs. 


~ If x is finite but the next representable number is infinite, 
OVERFLOW and INEXACT are signaled. " 
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are - If the next representable number lies strictly between ~M and 
+M, where M is the smallest positive normalized number for that 
format, and if x is not equal to y, UNDERFLOW and INEXACT are 


signaled. 


Binary Scale and Log 

Two procedures, ScalbX and LogbX, are provided for manipulating the 
binary exponent of an Extended variable. 
An Extended variable can be efficiently scaled by a power of two by 

procedure ScalbX (n : integer; var y : Extended); 4 
The procedure ScalbX computes y * :و‎ and returns it in y. Note 
that the magnitude of n can be greater tham the largest binary exponent 
in extended precision (that is, 16383), as the value 2 is not 
explicitly computed. In fact, a denormalized value y can be scaled by 
MAXINT (that is, ScalbX (MAXINT, y)) without causing overflow. 


The binary exponent of an Extended variable cam be determined by 


/ procedure LogbX (var x : Extended); 


The procedure LogbX returns in x the binary exponent of x as a signed. 
integral value. (When. the old x is denormalized, the exponent is 
determined. as LE the. old x had first been normalized.) 


LogbX of a NaN. returns: the NaN. LogbX of an infinity is *INFINITY. 
LogbX of zero is -INFINITY and. signals the DIVBYZERO. exception. 
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The Elems Unit 


The Elems unit provides a number of mathematical functions, including 
logarithms, exponentials, two important financial functions, 
trigonometric functions, and a random number generator. The logarithms 
and exponentials are provided ín base~2 and base-e versions. All Elems 
procedures, except RandomX, handle NaNs, overflows, and underflows 


appropriately. All Elems procedures signal INEXACT appropriately, 


except that XpwrY, Annuity, and. Compound may signal INEXACT on exact 
results. 


فسا نیہ ۔ ات 2 ومان a‏ وھ ا و |o Rome‏ ہت t‏ د te‏ 
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Logarithms 


The procedures Log2X, LnX, and LnlX each operate on an Extended 
variable, returning the result in the input argument. 


The base-2 logarithm log, x 15 computed by 
procedure Log2X (var x : Extended); 
for any non-negative x. 
If x = +INFINITY, then Log2X sets x to +INFINITY and sets no exceptions. 


If x = 0, then Log2X sets x to -INFINITY and sets the DIVBYZERO 
exception. If x < 0, then Log2X sets x to a NaN and sets the INVALID 


exception. 


The natural (base-e) logarithm log, x is computed by 
procedure LnX (var x : Extended); 
for any non-negative x. 
If x = +INFINITY, then LnX sets x to +INFINITY and sets no exceptions. 


If x = 0, then LnX sets x to -INFINITY and sets the DIVBYZERO exception. 
If x > 0, then LnX sets x to a NaN and sets the INVALID exception. 


Kare d‏ دمه اد 


The natural (base-e) logarithm log, (1 * x) is computed by 
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procedure LnlX (var x : Extended); 
for any x >= -l. 


If x = +INFINITY, then LnlX sets x to +INFINITY and sets no exceptions. 
If x = -1, then LnlX sets x to -INFINITY and sets the DIVBYZERO 
exception. If x > -l, then LnlX sets x to a NaN and sets the INVALID 
exception. 


The method of computing this value does not explicitly add 1 to x, and 
so is not equivalent to 


I2X (1, one); ( one <== 0 ) 
AddX (one, x); { x نس‎ 1.0 + x ( 
LaX (x); 


where one is an Extended variable. Procedure LnlX is especially useful 
for handling financial applications. If the input argument x is a small 
positive value, such as an interest rate, the computation of LnlX (x) is 
more precise than the sequence above, because no precision is lost in x 
by the addition of ۰ 


ee‏ هغاد مرن مې و د 


Exponentials 


Procedures Exp2X, ExpX, and ExplX each operate on an Extended variable, 
and return the result in the input argument. Procedure Xpwri operates 
on an Extended variable, using an integer value, and returns,the result 
in the Extended input argument. Procedure XpwrY operates on two 
Extended variables, and returns the result in the second input argument. 


procedure Exp2X (var x : Extended); 
The procedure Exp2X calculates 2* and returns this value to x. 


If x = +INFINITY, then Exp2X sets x to *INFINITY. If x = -INFINITY, 
then Exp2X sets x to 0. Neither case sets any exceptions. 


procedure ExpX (var x : Extended); 
x ; 
The procedure ExpX computes e and returns this value to x. 


If x = +INFINITY, then ExpX sets x to +INFINITY. If x = -INFINITY, then 
ExpX sets x to 0. Neither case sets any exceptions. 


procedure ExplX (var x : Extended); 
The procedure ExplX computes e* - | and returns this value to x. 


i If x = +INFINITY, then ExplX sets x to *INFINITY. If x = -INFINITY, 
then ExpIX sets x to -l. Neither case sets any exceptions. 
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This procedure, like LnlX, is especíally useful for small input 

arguments, as the result is computed without explicitly subtracting 1 

from e^"; thus, the computation is more precise than if ExpX were used. 
procedure XpwrI (i : integer; var x : Extended); 


i < 
The procedure Xpwrl computes x and returns this value to x. 


If x is normal, denormal, infinite, or zero, then XpwrI,(0, x) returns 
x = l; in particular, if x = 0 or x is infinite, then x = l. 


procedure XpwrY (y : Extended; var x : Extended); 


7 and returns this value to x. 


The procedure XpwrY computes x 
XpwrY sets x to a NaN and signals INVALID if 

- both x and y equal 01 

-x = 1 and y is infinite; or 

- x is negative or -0 and y is nonintegral. 
If x is +0 and y is negative, then XpwrY sets x to *INFINITY and sets 
the DIVBYZERO exception. If x is -0 and y is integral and negative, 


then XpwrY sets x to -INFINITY if y is even, or to -INFINITY if y is 
odd, and sets the DIVBYZERO exception. 


Financial Functions 


The Elems unit provides two procedures, Compound and Annuity, that can 
be used to solve various financial problems. Each of these procedures 
takes two input arguments of type Extended, and produces an Extended 
result. The two input arguments, ٣ and n, represent in each case an 
interest rate and a number of periods, respectively. 


Compound Interest 


Compound interest can be computed using 
procedure Compound (r, n : Extended; var x : Extended); 
This procedure computes the value 


x i= (1 + r)", 


where r is the interest rate and n 15 the number of periods. 


If r > -l, then Compound sets x to a NaN and sets the INVALID exception. 
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If r = 0 and n is infinite, then Compound sets x to a NaN and sets the 
INVALID exception. If r= -l and n > O, then Compound sets x to 
+INFINITY and sets the DIVBYZERO exceptior. 


If PV is the present value of a given amount of principal to be invested 
at the rate of interest r for n periods, then FV, the future value of 
this principal, is 


FV = PV * (1 + r)’. 


Example 


If $1000 is invested for 6 years at 9% compounded quarterly, then what 
is the future value of the principal? Compute 


r, n, four, years, rate, PV, FV : Extended; 
f : DecForm; 
8 : DecStr; 


و د . 


var 


with 2 do begin style := FIXED; digits :* 2 end; 


I2X (4, four); { four <-- 4 } 
121 (6, years); { years <— 6 } 
Str2X ('0.09', rate); { rate <=- 9% } 

{ } 


I2X (1000, PV); PV <== 1000.00 


r :> rate; 

DivX (four, r); { r <=— rate / 4 } 
n ه:‎ years; 

MulX (four, 1 ( n <— ^ * years } 
Compound (r, n, FV); { FV <== (1 + r)^n } 
MulX (PV, FV); { FV &— PV * (1 + (ح‎ 01 } 
X2Str (f, FV, s); { ۶ 15 FIXED with 2 fraction digits.} 


writeln ('FV = $', s); 
The future value FV is $ 1705.77. 


Note that since the future value FV = PV * (1 + r)", then the presen? 
value PV = FY * (1 +r) . 


Example 


How much must a person invest today at 9% compounded quarterly to have 
$15,000 in his account in 6 years? Assuming f, rate, years, r, and n 
have the same values as in the example above, compute 
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var r, n, nn, four, years, rate, PV, FV : Extended; 
f : DecForm; 
S : DecStr; 


I2X (15000, FV); { FV &-- 15000.00 } 
nn :* n; 

NegX (an); { nn <— -n ] 
Compound (r, nn, PV); ( PV <-- (1 + r)^-n } 
MulX (FV, PV); { PV <==- FV * (1 + r)^-n } 
X2Str (f, PV, s); ( £ 1s FIXED with 2 fraction digits.) 


writeln ('PV = $', s); 


The present value PV is $ 8793.70. 


Value of an Annuity 


The present value and future value of an annuity can be computed using 


procedure Annuity (r, n : Extended; var x : Extended); 


This procedure computes the value 


dele) 


x i= m — e امو‎ 


r 


where ع‎ 15 the interest rate and n is the number of periods. 


If r = 0, then the procedure computes the sum of 1 + 1 * ... + l over n 
periods, and therefore returns x * n, and no exceptions are set (this 
value n corresponds to the limit as r approaches 0). Ff r © -l, then 
Annuity sets x to a NaN and sets the INVALID exception. If r = -[ and 
w > 0, then Annuity sets x to +INFINITY and sets the DIVBYZERG 
exception. 

This procedure, together with the procedure Compound, can be used to 
solve a variety of financial problems. An annuity is a sequence of 
equal payments made at equal time intervals, such as loan payments, 
stock and bond dividends, or life insurance premiums. The present value 
of an annuity is the sum of the present values of the several payments, 
each discounted to the beginning of the term. This value can be 
expressed as 


where PMT is one payment. 
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Example 


Suppose that a loan at 12% compounded monthly is to be paid off at a 
rate of $225 per month in 36 months. What is the present value of the 
loan? Compute 


var r, n, twelve, rate, PV, PMT : Extended; 
f : DecForm; 
s : DecStr; 


with f do begin style :> FIXED; digits :* 2 end; 


I2X (12, twelve); { twelve <-- 12 } 
Str2X ('0.12', rate); { rate <— 5 } 
Str2X ('36', n); ( n <== 6 } 
I2X (225, PMT); { PMT <== 225.00 } 
r :* rate; 

DivX (twelve, r); ( r <— rate / 12 } 
Annuity (r, n, PV); (PV <=- (1 - (1 + م / (ه-*(م‎ } 
MulX (PMT, PV); { PV <== PMT * (1 = (1 + (ه-*(ح‎ / ۲ ( 
X2Str (f, PV, s); { £ is FIXED with 2 fraction digits.} 


writeln ('PV = $', s); 
The present value PV is $ 6774.19. 
The future value of an annuity is the sum of the compound amounts of the 


payments, each accumulated to the end of the term. This can be 
expressed as 


FV = PMT * رظان‎ e 
r 


This value is just 


-n 
FV = PMT * (1 + “لس‎ * 1_4 


and so can be computed accurately using the procedures Compound and 
Annuity. 


Example 
If $50 is deposited each month to a savings account that pays 12% 


compounded monthly, what is the future value of the account after 10 
years? Compute m 


na————— ااا اال ااام ال‎  اا>”>ا‎ dinem 1 L a شاع سه هدس الل‎ 5 5 m m 9 ہم‎ ee 


ہ۔۔ k‏ ټم ا مت رقنهشهمسممستسمهم 
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var r, n, twelve, rate, years, FV, PMT, t : Extended; 
f : DecForm; 
S : DecStr; 


s.a 


with f do begin style := FIXED; digits := 2 end; 


I2X (12, twelve); ( twelve &-- 12 } 
Str2X ('0.12', rate); { rate سس‎ 122 ) 
I2X (10, years); ( years &-- 10 } 
I2X (50, PMT); ( PMT &-- 50.00 ) 
r :* rate; 

DívX (twelve, r); { r <== rate / 12 ) 
n :* years; 

MulX (twelve, n); ( n سب‎ years * 12 ) 


Compound (r, n, t); t وې (ح + 1( ہہ‎ 


و ( 

Annuity (r, n, FV); { FV <= (1 = (l + r)^-n) / ۲ } 
MulX (t, FV); { FV ہي‎ )) + r)^n- 1) r ) 
MulX (PMT, FV); { FV <— PMT * ((1* r)*n - 1) / r ) 


X2Str (f, FV, s); ( f is FIXED with 2 fraction dígits.) 
writeln ('FV = $'; s); 


The final value FV 1s $ 11501.93. 


Trigonometric Functions 


The trígonometic functions are provided by the procedures CosX, Sinx, 
TanX, and ATanX (arctangent or inverse tangent), which operate on an 
extended variable and return the result ín che input argument. 


The cosine is computed by 


procedure CosX (var x : Extended); 


The sine ís computed by 

procedure SinX (var x : Extended); 
If x fe infinite, then Sink delivers a NaN and signals INVALID. 
The tangent ís computed by 


procedure TanX (var x : Extended); 


If x is infinite, them TanX delivers a NaN and signals INVALID. 


ال ېي ELI‏ 
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CosX, SinX, and Tant use an argument reduction based on RemX (see 
Chapter 3) and the best extended-precision approximation to pi [25 


The arctangent is computed by 
pr^cedure ATanX (var x : Extended); 
Number results from ATanX lie between - pi / 2 and pi / 2. 
If x = +INFINITY, then ATanX delivers the nearest Extended approximati 


to pi / 2. If x = -INFINITY, then it delivers the nearest Extended 
approximation to ~ pi / 2. 


Random Number Generator 


Pseudorandom numbers are provided by 
procedure Rand omt (var x : Extended); 
A sequence of pseudorandom integral values r in the range 


1 مه‎ r 42211 -2 


can be generated by initializing an Extended variable r to an íntegral 
value (the seed) in the range and making repeated calls RandomX (r); 
each call delivers in r the next pseudorandom number in the sequence. 
RandomX uses the iteration formula 
Seger د‎ xo asd (o> edu 
If seed values of r are non-integral or outside the range 
1 ی‎ te 25-2 


then results are unspecified. 


Example 


A procedure yielding a pseudorandom rectangular distribution on (0, 1): 





Random Number Generator 


Exterior to the procedure declare and initialize 


seed = '1018375230'; { arbitrary seed 


P, one, r: Extended; 


I2X (1, one); ( one سب‎ 1 

P سم‎ one; ( P -ه»‎ 1 

ScalbX (31, P); ( P <== 1 
( 


SubX (one, P); P <=- 2731 - 1 


Str2X (seed, r); { r سب‎ seed 


The desired procedure can be written 


procedure Rand (var x: Extended); 


RandomX (r); { r <— random int value) 


) normalize to (0, 1) 


اس 


x i= Tj x <~- r 


DivX (P, x) 
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Appendix A 
The SANE and Elems Interfaces 
Here are the INTERFACE sections of the SANE and Elems units. 
0 
{$C Copyright Apple Computer, Inc., 1983 } 
i UNIT Sane { Standard Apple Numeric Environment } ; 
1 
۱ INTRINSIC CODE 23 DATA 24; 
۱ 
: INTERFACE 
3" di CONST 
i sa "T 8 
۱ SIGDIGLEN = 28 { Maximum length of SigDig. } 
DECSTRLEN = 80; { Maximum length of DecStr. } 
TYPE 
L ا ما اح اس‎ +5 nn ne nnn nn = مسب‎ += 
** Numeric types. 
<< SSS SS واس ا جوع ص موادا عا بد اس سام‎ eS ae ع‎ er te } 
Single = array [0..1] of integer; 
Double = array [0..3] of integer; 
Comp ع‎ array [0..3] of integer; 
Extended = array [0..4] of integer; 
[---—----2-----.-----2--2----————-—--—-—---------2-------2------------ 
** Decimal stríng type and intermediate decimal type, 
** representing the value (-1)^sgn * ۱0۳ مره‎ * sig 
متا‎ RUMOR ROS UL NC فود مل کبس مدا حب بب کے سد با‎ NUNT RN جا‎ } 
SigDig = string [SIGDIGLEN]; 
DecStr = string [DECSTRLEN]; 
i Po Decimal = record 
"E 1 sga 1 {Sign (O for pos; 1 for neg ( 
j سیت‎ exp : integer; {Exponent } 
| sig : SigDig (String of significant digits } 
: end; 
| 
| 








The SANE and Elems Interfaces Appendix A 2 
{ ER E E EEEE a a ND کچ و مسرت‎ 
٭٭‎ Modes, flags, and selections. 
ص ص عد ص کک همت‎ ee هخه شو‎ ena wae € m m } 
Environ = integer; 
RoundDir = (TONEAREST, UPWARD, DOWNWARD, TOWARDZERO); 
RelOp = (GT, LT, GL, EQ, GE, LE, GEL, UNORD); 
{> < <> = dm <= <=>} 
Exception = (INVALID, UNDERFLOW, OVERFLOW, DIVBYZERO, INEXACT) ; 
NumClass = (SNAN, QNAN, INFINITE, ZERO, NORMAL, DENORMAL ) ; 
DecForm = record 
i style (FLOAT, FIXED); 
digits : integer 
end; 
{---------- ص تا اا ت اناا ت ضما ااا ا اا اا اا ص طا ت ت صما‎ n ما مہ جب عم سم سد‎ EE EEE 
** Two address, extended-based arithmetic operations. 
سو ټون‎ te سا هس م کے کے کات م سا مه بت م اورا هووا ون نه متس مت‎ n س سے سس جنس م وواه کا‎ ML LLL 1.1. 
procedure 5 (x : Single; var y : Extended); 
procedure AddD (x : Double; var y : Extended); 
procedure AddC (x : Comp; var y : Extended); 
procedure AddX (x : Extended; var y : Extended); 
Üyimytxb | 
procedure SubS (x : Single; var y : Extended); 
procedure SubD (x : Double; var y : Extended); Lu 
procedure SubC (x : Comp; var y : Extended); 7 
procedure SubX (x : Extended; var y : Extended); 
) my-x} 
procedure MulS (x : Single; vary : Extended); 
procedure MulD (x : Double; var y : Extended); 
procedure MulC (x : Comp; var y : Extended); 
procedure MulX (x : Extended; var y : Extended); 
iy ew ےد‎ 
procedure DivS (x : Single; vary: Extended); 
procedure DivD (x : Double; var y : Extended); 
procedure DivC (x : Comp; var y : Extended); 
procedure DivX (x : Extended; var y : Extended); 
{y:sy/x} 
function CmpX (x : Extended; r : RelOp; y : Extended) : boolean; 
{ CmpX :- x r y L 
function RelX (x, y : Extended) : RelOp; 


{ x RelX y, where RelX in LGT, LT, EQ, UNORD] } 
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Po, 
3 à (-----—--—------------------------------------------------------------ 
** Conversions between Extended and the other numeric types, 
** including the type integer. 
2111 
procedure I2X (x : integer; var y : Extended); 
procedure S2X (x : Single; var y : Extended); 
procedure D2X (x : Double; var y : Extended); 
procedure C2X (x : Comp; var y : Extended); 
procedure X2X (x : Extended; var y : Extended); 
{ y سب‎ x (arithmetic assignment) T 
procedure 521 (x : Extended; var y : integer); 
procedure X28 (x : Extended; var y : Single); 
procedure X2D (x : Extended; yar y : Double); 
procedure X2C (x : Extended; var y : Comp); 
{ y := x (arithmetic رد ری‎ } 
{ meen nn nn nn nn هام‎ nn nnn ee 
** Conversions between the numeric types and the intermediate 
٭٭‎ decimal type. 
ا > د‎ ae کو‎ ee ام ا‎ en i as cw ج‎ } 
procedure $2Dec (f : DecForm; x : Single; var y : Decimal); 
procedure D2Dec (f : DecForm; x : Double; var y : Decimal); 
7 procedure C2Dec (f : DecForm; x : Comp; yar y : Decimal); 
procedure X2Dec (f : DecForm; x : Extended; var y : Decimal); 
( y := x (according to the format f) } 
procedure Dec2S (x : Decimal; var y : Single); 
procedure Dec2D (x : Decimal; var y : Double); 
procedure Dec2C (x : Decimal; var y : Comp); 
procedure Dec2X (x : Decimal; var y : Extended); 


) i= x} 


** Conversions between the numeric types and strings. 
** (These conversions have a built-in scanner/parser to convert 
** between the intermediate decimal type and a string.) 


procedure S2Str (f : DecForm; x Single; var y : DecStr); 
procedure D2Str (f : DecFotm; x : Double; var y : DecStr); 
procedure C2Str (f£ : DecForm; x : Comp; var y : DecStr); 
procedure 5256 (f : DecForm; x : Extended; var y : DecStr); 


[ y := x (according to the format f) ( 


procedure Str2S (x : DecStr; var y : Single); 
procedure Str2D (x : DecStr; var y : Double); 

R procedure 56220 (x : DecStr; var y : Comp); 
procedure Str2X (x : DecStr; var y : Extended); 


x}‏ ه: y‏ { ته" 
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{ anana an en و واو ت سط جنا وی وو ده سم مس ووو هم و و وو و قو وو و واو وو د‎ SF A سم حم ووو وي وس وو غه هوه‎ re 


Numerical library procedures and functions.‏ ٭٭ 


BMC ene } 


procedure RemX (x : Extended; var y: Extended; var quo: integer); 
{ (new y) := (old y) - x * n, where n is the integer closest 
to y / x (n is even in case of tie). 
quo :- low order seven bits of the integer quotient n, 
so that -127 <= quo <a 127. } 
procedure SqrtX (var x: Extended); 
{ x := sqrt (x) } 
procedure RintX (var x : Extended); 
{ x :» rounded to integral value of x } 
procedure NegX (var x : Extended); 
{ x := =x | 
procedure AbsX (var x : Extended); 
( x := |x| T 
procedure CpySgnxX (var x : Extended; y : Extended); 
{ x := x with the sign of y ] 


procedure NextS (var x : Single; y : Single); 

procedure NextD (var x : Double; y : Double); 

procedure NextX (var X : Extended; y : Extended); 
[ x :» next representable value from x toward y ) 


function ClassS (x : Single; var sgn : integer) : NumClass; 

function ClassD (x : Double; ar sgn : integer) : NumClass; 

function 013550 (x : Comp; ar sgn : integer) : NumClass; 

function ClassX (x : Extended; var sgn : integer) : NumClass; 
{ sgn :* sign of x (0 for pos, for neg) } 


< 
n 


< 


مم 


procedure ScalbX (n : integer; var y : Extended); 
{y :=y*2n)} 

procedure LogbX (var x : Extended); 
{ returns unbiased exponent of x } 


procedure SetRnd (r : RoundDir); 
procedure SetEnv (e : Environ); 


function GetRnd  : RoundDir; 
procedure GetEnv (vare: Environ); 


function TestXcp (x : Exception) : boolean; 
procedure SetXcp (x : Exception; OnOff : boolean); 
function  TestHlt (x : Exception) : boolean; 
procedure SetHlt (x : Exception; OnOff : boolean); 


اح سح احاح احاح ف 
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{$C Copyright Apple Computer Inc., 1983 } 


UNIT Elems; 


INTRINSIC CODE 18 DATA 19; 


Ll 


INTERFACE 


USES SANE { Standard Apple Numeric Environment } ; 


procedure Log2X (var x : Extended); 
{ x = log2 (x) ( 


procedure LnX (var x : Extended); 
{ x := ln (x) } 


procedure ۶ (var x : Extended); 
( x :« ln (l + x) } 


procedure Exp2X (var x : Extended); 
{ x :« 27x } 


procedure ExpX (var x : Extended); 
( x :« e^x T 


procedure 4575 (var x : Extended); 
{ x سه‎ ex - 1 } 


procedure 1 (i : integer; var x : Extended); 
{ x := x^i | 


procedure XpwrY (y : Extended; var x : Extended); 
{ x := xy] 


procedure Compound (r, n : Extended; var x : Extended); 
( x := (1 + 7)*n } 


procedure Annuity (r, n : Extended; var x : Extended); 
{ x = (1 = (1 + r)*-n) / r } 


procedure ATanX (var x: Extended); 
{ x := arctan (x) } 


procedure 512 (var x: Extended); 
( x :* sin (x) } 


procedure CosX (var x: Extended); 
{ x := cos (x) } 


procedure TanX (var x: Extended); 
{ x := tan (x) } 


procedure RandomX (var x: Extended); 
{ x :» (7^5 * x) mod (2731 = 1) } 
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Appendix 8 


Installing the SANE and Elems Units 


Before you can compile or execute a program that uses SANE, the SANE 
unit must be either in the SYSTEM.LIBRARY file on the system volume or 
in the program library file. To use the Elems unit, both the SANE and 
Elems units must be either ín the SYSTEM.LIBRARY on the system volume or 
in the program library. 


To use SANE, a program must have a USES declaration containing the 
identifier SANE immediately after the program heading. For example, the 
following USES declaration makes the public declarations of SANE 
available to the program: . 


Program Calculate; 


uses SANE; 


ee. 


To use the Elems unit, a program must have a USES declaration containing 
both the identifiers SANE and Elems immediately after the program 
heading. As the Elems unit uses the SANE unit, SANE must appear in the 
USES declaration before Elems. For example, the following USES 
declaration makes all the public declarations of both Elems and SANE 
available to the program: 


Program Calculate; 


uses SANE, Elems; 


© © و 


On your Apple Pascal SANE disk (an Apple II Pascal disk), you will find 
two versions of the SANE unit:  SANE:APPL2.SANE.CODE for Apple II's and 
SANE:APPL3.SANE.CODE for Apple III's. You will also find one version of 
the Elems unit for both Apple II's and Apple III's:  SANE:ELEMS.CODE . 
Use the LIBRARY.CODE program to move the appropriate SANE unit and the 
Elems unit to SYSTEM.LIBRARY or to a program library. To specify the 
pathname of a program library that contains SANE or Elems, use the 
SUSING option for Apple III's or the SU option for Apple II's. 


Some Apple II Pascal systems require that a program using the SANE unit 
include the 55 option in order to compile. 


Vom, 
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See the Apple II Pascal Language Reference Manual for a description of 
program libraries and of the $U and 55 compiler options. 


See Volume 1 of the Apple 111 Pascal Programmer's Manual for a 
description of program libraries and Volume 2 for a description of the 
SUSING compiler option. 
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Appendix C 


SANE and Built-in Pascal Arithmetic 


SANE and Apple III Pascal RealModes 


—————— 


When you use the SANE unit with Apple III Pascal, two distinct 

floating-point systems are operative. The floating-point environment of 

SANE is totally separate from that provided by Apple III Pascal and 

accessed by the RealModes unit. Each hes its own rounding direction, 

exception flags, and halt settings, and each has its own declared types 
C and routines for manipulating the environment. For example, 


SetXcp (INVALID, FALSE); 


from the SANE interface, clears the SANE invalid-operation exception 
flag but does not affect the RealModes flags. Likewise, 


SetXcpn (INVOP, FALSE); 


وسم te‏ کت A‏ ند س کے یه m‏ € لان —— سام ینعي ها ——— موه 


from RealModes, clears the RealModes ínvalid-operation flag and does not 
affect the SANE flags. Execution of 


DivX (x, y); 
may set SANE flags but not RealModes flags, and 


v i= v / u; 


ع ميت حل ei ee‏ جرد ولاك مسح ين سے 


may set RealModes flags but not SANE flags. 


If you use environmental features, note that the two systems use 
different names for corresponding things: for example, INVALID 
and INVOP. If you use the wrong name, you may alter a setting of 


the other system, so be very careful to use the correct set of 
names for each unit. 


To minimize confusion, we encourage you to work entirely within one or 
the other of the floating-point systems whenever possible. For cases 
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when both systems are required, conversions between the real and Síngle 
types are presented later in this appendix. Conversions between the 
long integer and Comp types appear in Appendix E. The SANE unit 
includes procedures to convert between integer and Extended. 


In most cases you can decide which floating-point system to use by 
asking whether seven-decimal-digit precision, provided by the real type, 
is completely adequate to solve the . ^blem at hand. For such a problem 
the Apple III Pascal RealModes float -poiat offers the advantage of 
built-in arithmetic operators and in;..:/output routines for easter 
programming and possibly faster execution. 


If you need the extra precision or range of the Double, Extended, and 
Comp types or any of the special features of SANE or Elems (such as 
compound-interest functions), then you must use the SANE unit. In 
addition, you may find SANE helpful even when input and output values 
have only single-precision significance. It may be very difficult to 
prove that single-precision arithmetic is sufficient for a given 
calculation; using extended-precision arithmetic for intermediate values 
will often improve the accuracy of single-precision results more than 
virtuoso algorithms would. Likewise, using the extra range of the 
Extended type for intermediate results may yield correct final results 
in the Single type when using the range of the Single type would cause 
an overflow or a catastrophic underflow. 


In future versions of Apple III Pascal that incorporate the 
higher-precision types into the syntax of the language, all 
floating-point expressions will be evaluated in Extended, regardless of 
the types of the operands. Hence, results in future systems will be 
consistent with results obtained from SANE. 


Other differences, generally resulting from changes in the IEEE 
Standard, between SANE and RealModes floating-point follow: 


- In SANE, all default halt settings are FALSE (clear), so that 
floating-point exceptions (for example, division-by-zero) do 
not halt a program. 


- SANE does not provide the optional closure mode for projective 
treatment of infinities or warning mode for special handling of 
unnormalized operands. These modes have been removed from the 
IEEE Standard. 


- RealModes floating-point signals underflow when a result is 
sufficiently small: normalizing the result before rounding 
would require an exponent smaller than the minimum exponent for 
the storage-type. SANE signals underflow only when the result 
15 sufficiently small and the delivered result is inexact. 
Thus, small but exact results do not signal underflow in SANE. 
This difference reflects a change in the definition of 
underflow in the IEEE Standard. 


- SANE has no exception flag specifically for integer conversion 
overflow. 


جني EL‏ اس —— مي ما امو اب ناس سو سای =- 
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Conversions Between Real and Single 


The Pascal type real and the SANE type Single are distinct types. We 
encourage you to work entirely with one type or the other whenever 
possible. However, you may wish to use Single arguments in Pascal 
routines calling for real arguments. This will require you to convert 
between types, which you can do by creating two routines: 


function S2R (s : Single) : real; 
var v : record case boolean of 
TRUE : (s : Single); 
FALSE : (r : real) 
end; 
begin ( S2R ) 


V.S :" S; 
SAR :* ver 


end { SAR} ; 


procedure R28 (r : real; var s : Single); 


is. var v : record case boolean of 


TRU : (s : Single); 
FALSE : (r : real) 
end; 


begin { 825 } 


v.r :a r; 
S ع:‎ V.S 


end ( R28 ) ; 


These procedures may not be supported in future versions of Apple 


Pascal. 
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Appendix 0 


Managing the SANE Floating-Point 
Environment 


5 ې 


The SANE floating-point environment consists. of the rounding direction, : 
exception flags, and halt settings. à E 
This appendix provides guidelines for writing a unit of shared black-box 
subroutines sa that a persom using them can expect that a subroutine 
call 


و See‏ ا و [B‏ 


- will not change rounding direction or halt settings; 


— will not clear exceptíom flags and will signal exceptions only 
as. documented.. 


1 
> mai مها‎ 


The basic idea of the. management scheme is to inittalize a standard 
subroutine environment and to replace the calling program's environment 
with the standard subroutine environment while a subroutine runs. The 
following code could be included in a unit of subroutines. in order to 


handle the SANE floating-point environment properly. (Note that if a 


subroutine does not call SANE routines that have access. to the 
floating-point environment, either directly as SetRnd does or indirectly 
as. AddS does, it does not need any code to manage the floating-point 
environment.) 


Include in the implementation 
const FIRSTXCP = INVALID; 
LASTXCP  * INEXACT; 


var StdSbrEnv, TempEnv: Environ; 
Xcp: Exception; 


in the initialization 


mes گت‎ 


سک ود اش c ks‏ می vo‏ 






Se — دا‎ 


asm- خد‎ 
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GetEnv (TempEnv); L TempEnv <~— current environment } 
SetRnd (TONEAREST); { set rounding to nearest - 
or other direction if desired } 

for Xcp : PIRSTXCP to LASTXCP do begin 

SetXcp (Xcp, FALSE);{ clear all exceptions } 

SetHlt (Xcp, FALSE) { clear all halts } 
ends. 
GetEnv (StdSbrEnv); L StdSbrEnv <— configured environment T 
SetEnv (TempEnv); { restore environment } 


and in each subroutine that uses SANE 

var CallingEnv: Énviron; { environment of calling program } 
If specifications do not call for the subroutine to set exception flags, 
ther at the beginning of the subroutine include 


GetEnv (CalllagEnv); ( save calling program environment: F۴ 
SetEnw (StdSbrEnv); f install standard subroutine environment} 


s at. the end include 
SetEnw (CallingEnv); ( restore calling program environment } 


Por n most 0مم‎ this ۹۳ simple and sufficient management. of 
the Sie environment. 


LE 200000 call for subroutines to set exception flags, then each 
sucis subroutine could begin with 


‘EneryBrotocol (CallingEnv); 
and end? with 


; ExftProtocol (CallingEnv); 


where the ímplementation includes 


procedure EntryProtocol (var CallingEnv : Environ); 
0 
GetEnv (CallingEnv); ( save calling program environment } 
. SetEnv (StdSbrEnv) { install standard subroutine environment } 
end; 


and | 


ا یں ۰ په ftot‏ 9.80 وه د ره ا —— ——— reg « var cy‏ ممدروسے مد BNA N‏ ه. 
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3 


کے سے سار TEE‏ سم SA SR‏ ېاپ مس ar‏ ټی ا ا پر د —————— ات سج r‏ نود ود پا ا ا ا لت لالت ا eee‏ سے - 





Appendix D Managing the SANE Floating-Point Environment 


procedure ExitProtocol (CallingÉnv : Environ); 
var FlagSet : array [FIRSIXCP..LASTXCP] of boolean; 
Xep : Exception; 
begin 8 
for Xcp :* FIRSTXCP to LASTXCP do 
PlagSet [Xcp] := TestXcp (Xcp);z 
Í save exceptions set by subroutine 
SetEnv (CallingEnv); { restore calling program environment 
for Xcp :* FIRSTXCP to LASTXCP do 
if FlagSet [Xcp]l then SetXcp (Xcp, TRUE) 
L set subroutine's exceptions: in 
effect halts set by calling program 
end; 
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Appendix 5 


^ Conversions Between Long Integer and 
Comp 


We advise the use of the Comp type instead of long integers because the 
Comp type is more fully integrated into the arithmetic. For example, an 
accounting applícation that uses the Comp type for exact wide-precision 
calculations could readily be combined with a financial application that 
1 uses the SANE floating-point types and the Elems procedures for 
: compound-interest calculations. Also, as an integral part of the 
Standard Apple Numeric Environment, the Comp type will be supported in 
i future Apple products. Using the Comp type will make it easier to move 
data from one system to another. 
If you need to convert between the Apple Pascal long-integer type and 
ka the SANE Comp type, you can use the following code: 


const LONGINTSIZE = 25; { replace 25 by suitable value ( 
type longint = integer [36]; 


userlongint = integer [LONGINTSIZE]; 


i { Convert: any integer or long integer --> Comp 
If the long integer exceeds the range of the Comp format, 
: then a Comp NaN is delivered. } 


procedure LI2C (i : longint; var c : Comp); 
var $ : DecStr; ( for intermediate string representation ] 


begin ( LI2C ] 


str (i, s); 
Str2C (s, c) 


—— 


end { LI2C }; 


err ——‏ - چم مم 
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( Convert: Comp ند‎ long integer of length LONGINTSIZE 
Comp NaNs and overflows cause run-time error halts 
(as do overflows in long integer arithmetic). } 


procedure C2LI (c : Comp; var i : userlongint); 


var f : DecForm; ( for formatting decimal ) 
ordO : integer; { will be ord ('0') } 
d : Decimal; ( for intermediate decimal form } 
j : integer; ( loop index } 


begin { C2LI } 


f.style :* FIXED; ( For speed, the initializations of } 
f.digits := 0; ( f and ordO could be done globally. } 
ordO := ord ('0'); 


i ه:‎ 01 
C2Dec (f, c, d); 
if d.sigíl] « 'N' then halt 
else 

for j := 1 to length (d.sig) do 

i = 10 * i ~ ordO + ord (d.sig [j]); 

if d.sgn = 1 then i :* -i 
iL يدنه‎ 


i end { C2LI و(‎ 





Appendix F 


Errors in SANE and Elems 


This appendix describes deviations of the current release of the SANE 
and Elems units from the specification in this manual. These deviations 
will not be supported in future releases. 


SANE Unit 


The INVALID: exception is set. when a Comp NaN is encountered by an 
arithmetic operator (AddC, SubC, MulC, or DivC) or a conversion (C2Str, 
C2Dec, or C2X). 


Elems Unit 


In n i849egative and. (I + ادارے‎ exceeds the largest number in Extended 
(3 10 ), them Annuity (r, n) may erroueously signal OVERFLOW and 
produce an infinite result. (Note that such values of r and n are well 
beyond the useful range.) 
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Appendix G 


Additional Details about 
Binary-Decimal Conversions 


سم — u‏ لهم ل 


Conversions from the Decimal Record e 


The following remarks apply to: Dec2S, Dec2D, Dec2C, and Dec2X. 


wih arc laiemad و. > سه‎ 


For maximum accuracy, insert or delete trailing zeros. for sig im order 
to minimize the magnitude of “exp. For example, for 1.0560 set. 

sig = '1000000000000000000000000000' (27 zeros) and exp = 33, and for 
300E-43 set sig = ۱3۲ and exp = -41. 


سے سه )هه — ———]— موہ چا اس یہ 


ما مھ جم ما 


© 


If you are writing a parser and must handle a number withr more than 28 
significant digits, follow these rules: 


۱ : Place the implicit decimal point at the right of the 28 mast 
significant digits. 


۱ LE any of the discarded. digits to the right of the implicit decimal 
f point are nonzero, then 


j (f) set the INEXACT exception to TRUE, and 


{ (2) LE the number is positive and the rounding mode fs UPWARD or 
if the number is negative and the rounding mode is DOWNWARD, 
then take the successor of the last (28th) ASCII character to 
guarantee a correctly rounded result. (The successor of '9* 
is dum 


i The choice of 28 for SIGDIGLEN is peculiar to this implementation of 
S.A.N.E. Other implementations may use other values. 


pP ee Eau cx. und 


— 





€——— —A o" 











Appendix H 


Annotated Bibliography 


Apple Computer, Inc. “Appendix A: The Transcend and Realmodes 
Units" and "Appendix E: Floating-Point Arithmetic," Apple III 
Pascal Programmer's Manual, Volume 2, pp. 2-9, 56-85. 


These appendixes describe the implementation of 
single-precision arithmetic in Apple 111 Pascal, which was 
based upon Draft 8.0 of the proposed Standard. 


Cody, W. J. "Analysis of Proposals for the Floating-Poínt 
Standard." IEEE Computer, Vol. 14, No. 3, March 1981, pp. 63-68. 


This paper compares the several contending proposals 
presented to the Working Group. 


Coonen, Jerome T. "An Implementation Guide to a Proposed Standard 
for Floating-Point Arithmetic." IEEE Computer, Vol. ,رتا‎ No. را‎ 
January 1980. 


This paper is a forerunner to the work on the draft 
Standard. 


Coonen, Jerome T,  "Underflow and the Denormalized Numbers." IEEE 
Computer, Vol. l4, No. 3, March 1981, pp. 75-87. 


Coonen, Jerome T. “Accurate, Yet Economical Bínary-Decimal 
Conversions." To appear in ACM Transactions on Mathematical 
Software. 2 


Demmel, James. "The Effects of Underflow on Numerical 


سا  —‏ ہہس سپ ٹل a‏ سپ ونوت —— 


Statistical Computing. 


These papers examine one of the major features of the 
proposed Standard, gradual underflow, and show how 
problems of bounded exponent range can be handled through 
the use of denormalized values. 


[1] 


[2] 


13] 


[4] 


(5] 


(6] 


-- REN i mete ocu Re t 


st LR E D‏ سوت سی یک مه اک تینما ے چیہ 


Appendix H 


74 Annotated Bibliography 


[7] Fateman, Richard J. "High-Level Language Implications of the 
Proposed IEEE Floating-Point Standard." ACM Transactions on 


Programming Languages and Systems, Vol. 4, No. 2, April 1982, 


pp. 239-257. 


This paper describes the significance to high-level 
languages, especially FORTRAN, of various features of the 


IEEE proposed Standard. 


]8[ Floating-Point Working Group 754 of the Microprocessor Standards 
Committee, IEEE Computer Society. "A Standard for Binary 


IEEE, 345 East 47th 


Floating-Point Arithmetic." Proposed to 
Street, New York, NY 10017. 


The implementation of SANE is based upon Draft 10.0 of 


this Standard. 


(9] Floating-Point Working Group 754 of the Microprocessor Srandards 
Committee, IEEE Computer Society. "A Proposed Standard for Binary 
Floating-Point Arithmetic." IEEE Computer, Vol. 14, No. 3, March 


This 15 Draft 8.0 of the proposed Standard, which was 
offered for puhlic comment, The current Draft 10.0 is 
substantially simpler than this draft; for instance, 


1981, pp. 51-62. 


warning mode and projective mode have been eliminated, and 
the definition of underflow has changed. However, the 
intent of the Standard is basically the same, and this 
paper includes some excellent introductory comments by 
David Stevenson, Chairman of the Floating-Point Working 


IEEE 754 Standard for 


Group. 


[10] Hough, D. ""Applications of the Proposed 


Floating-Point Arithmetic." IEEE Computer, Vol. 14, No. 3, March 


1981, pp. 70-74. 


This paper is an excellent introduction to the 
floating-point environment provided by the proposed 
Standard, showing how it facilitates the implementation of 


[11] Kahan, W. "Interval Arithmetic Options in the Proposed IEEE 
Floating-Point Arithmetic Standard," Interval Mathematícs 1980 
(ed. K. E. La Nickel). New York: Academic Press, New York, 1980, 


robust numerical computations. 


. aked s. . 


pp. 99-128. 


This paper shows how the proposed Standard facilitates 


interval arithmetic. 


سس هس ضسش وم CR‏ خي بسچ — M‏ مومسم ماس يل — MÀ — — MÀ‏ —— — الم 






E‏ رف سا و 
SM PE er‏ 


Appendix H Annotated Bibliography 75 


[12] Kahan, W., and Coonen, Jerome T. "The Near Orthogonality of 
Syntax, Semantics, and Diagnostics in Numerical Programming 
Environments," The Relationship between Numerical Computation and 
Programming Languages (ed. J. K. Reid). New York: North Holland, 
1982, pp. 103-115. 


This paper describes high-level language issues relating 
to the proposed IEEE Standard, íncluding expression 
evaluation and environment handling. 











PN‏ کے 
۷٧‏ 


92 


1 
SANTA L 
23 وی وکا‎ A 
ARR 
éi SIRVE ا‎ 

m up v, 


ee ہے‎ 
2 








Glossary 


Application type. A data type used to store data for applications. 


Arithmetic type. A data type used to hold results of calculations 
inside the computer. The SANE arithmetic type, Extended, has greater 
range and precision than the application types, in order to improve the 
mathematical properties of the application types. 


i Binary floating-point number. A string of bits representíng a sign, an 

: exponent, and a significand. Its numerical value, if any, is the signed 

7 ` product of the significand and two raised to the power of its exponent. 

ER Comp type. A 64-bit application data type for storing integral values 

5 د‎ of up to 19- or 20-decimal-digit precision. It is used for accounting 
.applications, among others. 


;,Denormalized number, or denorm. A nonzero binary floating-point number 

“that is not normalized (that is, whose significand has a leading bit of 

-Zero) and whose exponent is the minimum exponent for the number's 
storage type. 


Double type. A 64-bit application data type for storing floating-point 
values of up to 15- or l6-decimal-digit precision. It is used for 
statistical and financial applications, among others. 


5 Environmental settings. The rounding direction, plus the exception 
: flags and their respective halts. 


Exception flag. Each exception has a flag that can be set, cleared and 
tested. It is set when its respective exception occurs and stays set 
until explicitly cleared. 


Exceptions. Special cases, specified by the IEEE Standard, in 
arithmetic operations. The exceptions are INVALID, DIVBYZERO, OVERFLOW, 
UNDERFLOW, and INEXACT. 


Expouent. The part of a bínary floating-point number that indicates the 
power to which two 15 raised in determining the value of the number. 

The wider the exponent field in a numeric type, the greater range it 
vill handle. 
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Extended type. An 80-bit arithmetic data type for storing 
floating-point values of up to 19- or 20-decimal-digit precision. SANE 
uses it to hold the results of arithmetic operations. 


Halt. Each exception has a halt that can be set or cleared. If a halt 
is set, the program will halt when the exception occurs. Halts remain 
set until explicitly cleared. 


Infinity. A special bit pattern produced when a floating-point 
operation attempts to produce a number greater in magnitude than the 
largest representable number in a given format. Infinities are signed. 


Integer type. The 16-bit integer data type used in Pascal, typically 
for program indexing. It is not a SANE type but is available to SANE 
users. 


Integral value. A value in a SANE type that is exactly equal to a 
mathematical integer: ..., -2, -1, 0, 1, 2, .... 


NaN (Not a Number). A special bit pattern produced when a 
floating-point operation cannot produce a meaningful result (for 
example, 0/0 produces a NaN). Naha can also be used for uninitialized 
storage. NaNs propagate through arithmetic operations. 


Normalized number. A binary floating-point number in which all کے‎ 
significand bits are significant: chat is, the leading bit of the 
significand is ۰ س ر‎ 


Quiet NaN. A NaN that propagates through arithmetic operations without 
signaling an exception (and hence without halting a program). 


Rounding direction. When the result of an arithmetic operation cannot 
be represented exactly in a SANE type, the computer must decide how to 
round the result. Under SANE, the computer resolves rounding decisions 
in one of four directions, chosen by the user: TONEAREST (the default), 
UPWARD, DOWNWARD, and TOWARDZERO. 


Sign bit. The bit of a Single, Double, Comp, or Extended number that 
indicates the number's sign: 0 indicates a positive number; l, a 
negative number. 


Signaling NaN. د‎ NaN that signals an INVALID exception when the NaN is 
an operand of an arithmetic operation. If no halt occurs, a quiet NaN 
is produced for the result. No SANE operation creates signaling NaNs. 


Significand. The part of a binary floating-point number that indicates 
where the number falls between two successive powers of two. The wider 
the significand field in a numeric type, the more resolution it will 
have. 


Single type. A 32-bit application data type for storing floating-point 
values of up to 7- or 8-decimal-digit precision. It is used for — 
engineering applications, among others. 
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Two-address operation. An operation performed on two arguments, with 
the result stored in one of the input arguments, destroying its previous 


value. 


o ean تیه مسي ےہا‎ e 





اا و 
ا 


نوخ 








mee — سام‎ ——- o- 


e‏ دوو فقوت د اس شا میسو ہے 


Index 


j 5 
l AbsX 37, 54 
accounting applications 10 
add 13 
AddC 13, 52 : 
f s, AddD 13, 52 
- L sz : AddS 13, 52 
بب‎ v^ AddX 13, 52 


Annuity 41, 45, 46, 55 
. APPL2.SANE.CODE 7 
 APPL3.SANE.CODE 7 
*Apple II v, 6, 57-58 
Apple Ile 6 
Apple 111 v, 6, 57-58, 59 
application data types 
Comp 6, 9, ll, 51 
Double 6, 9, 11, 51 
Single 6, 9, 11, 1 
ی‎ ۱ l argument reduction 14, 48 
١ arithmetic 
Extended-based 6-7 
' extended precision 7 
4 IEEE-conforming 5 
"od operations in INTERFACE 52 
Pascal integer 3 
: l6-bit integer 9 
L^ type 9 
ATanX 48, 55 
auxiliary procedures 37-39 


A 


aes‏ اھ لاس MA‏ مه فا et AR a M‏ مو 
- 
~ 


LO 


———— —— —— ع ند‎ R حم‎ — e د متا € ^ شی و سس ت‎ eee 


82 Index 


B 


base-2 logarithm 41 
base-e logarithm 41 
binary 
approximations 18 
log 39 
poine 10 
scale 39 
Binary Floating-Point Arithmetic, Standard 754 for 
Draft 8.0 1, 6 
Draft 10.0 1, 6 


6 


C2Dec 21, 3 
C2Str 19, 53 
C2X 17, 53 
canonical form for decimal numbers 20 
cents 10 
ClassC 31, 54 
ClassD 31, 54 
ClassS 31, 54 
ClassX 31, 54 
CmpX 27, 52 
Comp 6, 9, و11‎ 51, 7 
Comp NaN 18, 68 
comparison(s) 27-28 
functions 27-8 
involving infinities and NaNs 28 
compiler option 
55 57, 58 
SU 57, 58 
SUSING 57, 58 
Compound 43, 55 
conversions 17-21, 7l 
between 
binary and decimal 18-21, 71 
Extended and other numeric types 17-18, 53 
long integer and Comp 67, 68 


numeric types and intermediate decimal type 20-21, 53, 71 


numeric types and strings 18-20, 53 
teal and Single 61 
Decimal record 20-21, 1 
decimal strings into SANE types 18-19 
SANE types into decimal strings 19-20 
to and from Extended 17-18 
CosX 47, 55 
counting type 10 
CpySgnX 38, 54 


r RO‏ بوس = سس ري سم سے۔- سور سس و و رود سس وی و و و رو وروی وت ٤9-ےہ e‏ و تج جو وس 
eee‏ سم en‏ ہے مم وےسے۔ ~ ۱ - 





لاجد 


- سپ کچھ‎ A C 


2 


53 ,21 م0286 
53 ,19 2567 
D2X 17, 53‏ 
data types‏ 
choosing 9‏ 
Comp 6, 9, 11, 51‏ 
Double 6, 9, 11, 1‏ 
Extended 6-7, 9, ll, 51‏ 
Single 6, 9, Il, 51‏ 
71 ,53 ,21 08620 
Dec2D 21, 53, 71‏ 
Dec28 21, 53, 71‏ 
Dec2X 21, 53, 71‏ 
52 ,21 ,20 ,وا ,3 DecForm‏ 
Decimal 20, 1‏ 
record conversions 20-21, 71‏ 
ا string type in INTERFACE‏ 
DecStr 20, 51‏ 
DECSTRLEN 51‏ 
default modes 6‏ 
DENORMAL 31, 52‏ 
denorms li‏ 
denormalized number(s) |I, 30-31‏ 
smallest representable positive 12‏ 
derivatives 24‏ 
differences between SANE and RealModes 59, 60‏ 


digits 52 
DIVBYZERO 35, 52 
Dive 13, 52 

DivD i3, 52 
divide 13 

DivS 13, 52 

DivX 13, 52 


DotProduct example 3-5 
Double 6, 9, و11‎ 51 
DOWNWARD 33, 52 

Draft 8.0 1, 6 

Draft 10.0 1, 6 


E 


EchoNumber example 2-3 
Elems unit 41-49, 55 
errors in 69 
exponentials 42-43 
financial functions 43-47 
compound interest 43-45 
value of an annuity 45-47 
installing 7 
ELEMS.CODE 7 
Environ 52, 63 


L 
1 
+ 





ې يوا د د مه سوب هام ونه 


et ada,‏ مہ € ep N‏ امن مالس پو هفاص مه 


سے -۔مصٗصجسچھ۔ re‏ کم غاد —— — — جو ۔ 


— À— 





84 Index 


environmental controls 33-36, 63-65 
EQ 27, 52 
errors in SANE and Elems 69 
Exception 34, 52 
exception(s) 
DIVBYZERO 35, 52 
flags 6, 34-36, 63 
INEXACT 36, 52 
INVALID 35, 52 
OVERFLOW 35, 52 
UNDERFLOW 35-36, 52 
exit from loop 31 
exp 20, 51 
ExplX 42, 55 
Exp2X 42, 55 
exponent 10 
range 1 
exponentials 42-43 
expression evaluation 23-26 
ExpX 42, 55 
Extended 6-7, 9, ll, 51 
Extended accumulator 23 
Extended-based arithmetic 7 
extended-based expression evaluation 23-26 
extended-precision arithmetic 6, 9 


F 


financial functions 

compound interest 43-45 

value of an annuity 45-47 
flags 6, 34-36, 52, 59, 63 
floating-point 

environments 59, 60, 63 

storage formats 10, 1 
flush-to-zero 30 

exit 31 
formatting numeric output 20 
future value 44 

of an annuity 46 


G 


GE 27, 54 

GEL 27, 52 

GetEny 33, 54 
GetRnd 33, 54 

GL 27, 52 

global constants 26 
GT 27, 52 








Index 


H 


halts 34-36 
Horner's Rule 5 


I 


I2X 17, 53 
IEEE (1 

remainder function 14 

Standard 1, 60 
IEEE-conforming arithmetic 5 
INEXACT 36, 52 
INFINITE 31, 52 
infinities ll, 29 

comparisons involving 28 
INFINITY 29 
infix operators 6 
installing 

Elems 57 

SANE 57 
Institute of Electrical and Electronics Engineers See IEEE 
integer 9, 11 
integral 78 
interest 10 
INTERFACE 20, 51 
intermediate decimal type See Decimal 
INVALID 35, 52 


۰ 


L 


largest representable number 12 
LE 27, 2 
library 
functions 54 
procedures 54 
LIBRARY.CODE 7 
Lisa Workshop 6 
LnlX 42, 55 
LnX 41, 55 
log, binary 39 
Log2X 41, 55 
logarithms 
base-2 41 





Doc 


7 ^ 
i 
K 


RN a eS 0004‏ سوسام rrr r ——— L‏ مسي م پش سسوم مہ نے سیا نا re‏ و پا مہ اسو ر سی سس پو a‏ سس Z‏ ریت سح بے سے 


base-e 41 

natural 41 
LogbX 39, 54 
long-integer type 67 
LT 27. 52 


M 


mils 10 

mode(s) See also environmental controls 
default 6 
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seed 48 
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sig 20, 21, 51, 71 
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SIGDIGLEN 20, 21, 51, 71 
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manipulation 37-38 
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